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Abstract 

This  research  was  performed  to  expand  AFIT’s  Radio  Frequency  “Distinct  Native 
Attribute”  (RF-DNA)  fingerprinting  process  to  support  IEEE  802.15.4  ZigBee  communi¬ 
cation  network  applications.  Current  ZigBee  bit-level  security  measures  include  use  of 
network  keys  and  Media  Access  Control  (MAC)  lists  which  can  be  subverted  through 
interception  and  spoofing  using  open-source  hacking  tools.  This  work  addresses  device 
discrimination  using  Physical  (PHY)  waveform  alternatives  to  augment  existing  bit-level 
security  mechanisms.  ZigBee  network  vulnerability  to  outsider  threats  was  assessed  using 
Receiver  Operating  Characteristic  (ROC)  curves  to  characterize  both  Authorized  Device 
ID  Verification  performance  (granting  network  access  to  authorized  users  presenting  true 
bit-level  credentials)  and  Rogue  Device  Rejection  performance  (denying  network  access  to 
unauthorized  rogue  devices  presenting  false  bit-level  credentials). 

Radio  Frequency  ‘Distinct  Native  Attribute’  (RF-DNA)  features  are  extracted  from 
time-domain  waveform  responses  of  2.4  GHz  CC2420  ZigBee  transceivers  to  enable 
human-like  device  discrimination.  The  fingerprints  were  constructed  using  a  “hybrid” 
pool  of  emissions  collected  under  a  range  of  conditions,  including  anechoic  chamber 
and  an  indoor  office  environment  where  dynamic  multi-path  and  signal  degradation 
factors  were  present.  The  RF-DNA  fingerprints  were  input  to  a  Multiple  Discriminant 
Analysis,  Maximum  Likelihood  (MDA/ML)  discrimination  process  and  a  1  vs.  many 
“Looks  most  like?”  classification  assessment  made.  The  hybrid  MDA  model  was 
also  used  for  1  vs.  1  “Looks  how  much  like?”  verification  assessment.  ZigBee 
Device  Classification  performance  was  assessed  using  both  full  and  reduced  dimensional 
fingerprint  sets.  Reduced  dimensional  subsets  were  selected  using  Dimensional  Reduction 
Analysis  (DRA)  by  rank  ordering  1)  pre-classification  Kolmogorov-Smimov  (KS)-Test 
p- values  and  2)  post-classification  Generalized  Relevance  Learning  Vector  Quantization- 
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Improved  (GRLVQI)  d,  feature  relevance  values.  Assessment  of  Zigbee  device  ID 
verification  capability  included  both  Authorized  Device  ID  Verification  and  Rogue  Device 
Rejection. 

Device  Classification  performance  using  full-dimensional  fingerprints  comprised 
of  Np=129  features  achieved  an  arbitrary  benchmark  of  average  correct  classification 
%C> 90%  (across  all  devices)  for  SNR>\  0.0  dB.  Performance  using  DRA«66%  (Nf= 243) 
reduced  dimensional  subsets  was  marginally  poorer  and  yielded  a  “gain”  of  G«-1.0  dB  at 
%C= 90%  relative  to  full-dimensional  performance;  gain  is  the  reduction  in  required  SNR 
for  two  systems,  methods,  etc.,  to  achieve  a  given  %C.  Additional  KS-Test  and  GRLVQI 
DRA  feature  selection  was  performed  and  classification  performance  assessed  using  the 
top-ranked  Np= 200,  100,  50,  and  25  features.  Relative  to  the  %C>90%  benchmark,  the 
KS-Test  and  GRLVQI  selected  feature  sets  required  the  same  SNR~10.0  dB  (Nf= 243) 
to  SNR~  18.0  dB  (N-p= 50).  For  Np =25,  KS-Test  selected  features  failed  to  meet  the 
benchmark  while  GRLVQI  selected  features  achieved  the  benchmark  at  S NR~ 30.0  dB. 

Authorized  Device  ID  Verification  performance  was  evaluated  using  the  Nf= 50  DRA 
feature  set.  Results  indicate  the  existence  of  a  device  dependent  threshold  whereby  all 
authorized  devices  achieve  an  arbitrary  True  Verification  Rate  (TVR> 90%)  and  False 
Verification  Rate  (FVR<  10%)  benchmark  for  both  DRA  methods.  Rogue  Device  Rejection 
was  assessed  using  unauthorized  rogue  devices,  with  each  rogue  device  falsely  presenting 
a  claimed  ID  matching  each  of  the  authorized  device  IDs.  Considering  an  arbitrary 
Rogue  Rejection  Rate  (RRR> 90%)  benchmark,  ROC  curve  analysis  for  Rogue  Device 
Rejection  indicated  that  performance  using  KS-Test  and  GRLVQI  selected  feature  sets  were 
consistent.  The  KS-test  DRA  selected  feature  sets  achieved  RRR> 90%  in  21,  29,  and  30  of 
36  rogue  scenarios  using  Nfy=  1 00,  50,  and  25  top-ranked  features,  respectively.  Similarly, 
the  GRLVQI  DRA  selected  features  achieved  RRR> 90%  in  23,  28,  and  30  of  the  36  rogue 
scenarios  using  Nfy=  1 00,  50,  and  25  top-ranked  features,  respectively. 
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USING  RF-DNA  FINGERPRINTS  TO  DISCRIMINATE  ZIGBEE  DEVICES  IN  AN 


OPERATIONAL  ENVIRONMENT 


I.  Introduction 


1.1  Operational  Motivation 

Wireless  Personal  Area  networks  (WPANs)  are  increasing  in  popularity  and  are  widely 
deployed  in  office  buildings,  factories,  home  networks,  and  hospitals.  The  Institute  of 
Electrical  and  Electronics  Engineers  (IEEE)  802.15.4  Media  Access  Control  (MAC)  and 
Physical-layer  (PHY)  standards  provide  a  low  power,  low-data-rate  WPAN  foundation 
on  which  network  (NWK)  and  application  (APL)  layers  are  built,  such  as  the  ZigBee 
specification  [26].  ZigBee  networks’  low  implementation  costs  and  low-complexity  make 
them  a  viable  solution  for  applications  such  as  industrial  control  and  monitoring  [14],  home 
automation,  remote  metering  [46],  patient  vital  sign  monitoring  [7],  security  systems  [45], 
and  asset  tracking  [50].  Depending  on  the  application,  ZigBee  networks  transmit  sensitive 
personal  information,  control  physical  systems  (valves,  fans,  lighting,  doors,  etc.),  and 
monitor  critical  sensors.  Improved  security  measures  is  an  essential  component  in  allowing 
ZigBee-based  networks  to  be  highly  reliable  and  secure.  The  need  for  improving  network 
security  is  motivated  by  open  source  tools  such  as  KillerBee  [49]  and  Api-do  [41] 
which  increase  ZigBee  network  vulnerability  and  enable  unauthorized  rogue  devices  to 
conduct  packet  replay,  network  key  sniffing,  MAC  address  spoofing,  malicious  network 
impersonation,  and  denial  of  service  type  attacks. 

Wireless  networks  are  characterized  by  the  seven  layer  Open  Systems  Interconnect 
(OSI)  model  such  as  shown  in  Fig.  1.1  [1].  Traditionally,  systems  have  predominantly 
relied  on  ’’bit-level”  security  mechanisms  implemented  in  the  Network  (NWK)  and 
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Figure  1.1:  Multi-layer  Open  Systems  Interconnect  (OSI)  network  model  [1]. 


Data  Link  (DLL)  layers  while  generally  ignoring  the  potential  for  PHY-layer  security 
augmentation.  Exploiting  this  potential  has  been  a  major  motivation  for  ongoing  research 
at  Air  Force  Institute  of  Technology  (AFIT)  which  exploits  wireless  device  PHY  waveform 
features.  This  is  accomplished  using  Radio  Frequency  ‘Distinct  Native  Attribute’  (RF- 
DNA)  fingerprints  which  provide  unique,  human-like  device  discrimination  using  RF-DNA 
features  that  vary  due  to  component  manufacturing  differences,  component  tolerances, 
design  differences,  and  device  aging.  The  inherent  RF-DNA  is  difficult  to  mimic  and 
replicate,  allowing  it  to  be  useful  in  discrimination  between  multiple  devices.  PHY  layer 
security  using  RF-DNA  fingerprints  is  a  viable  solution  for  augmenting  higher  layer  (NWK 
and  DFF)  bit- level  security  mechanisms. 
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1.2  Technical  Motivation 


AFIT’s  RF-DNA  fingerprinting  process  has  evolved  into  the  process  shown  in  Fig.  1.2. 
This  process  is  constantly  expanding  by  considering  new  signal  types,  new  feature  types, 
new  classification  methods,  and  new  device  ID  verification  methods.  Over  the  past  several 
years,  extensive  research  has  been  conducted  at  AFIT  [21,  23,  24,  28-30,  34,  35,  37- 
40,  42,  47,  48]  and  contribution  has  been  made  to  a  larger  body  of  research  being 
conducted  by  numerous  researchers  [8-10,  16-19,  27].  AFIT’s  research  activity  has 
predominately  focused  on  RF-DNA  fingerprinting  for  Device  Classification  using  various 
wireless  communication  signal  types,  such  as  Global  System  for  Mobile  Communication 
(GSM)  cellular  phones  [40,  47],  IEEE  802.11  WiFi  [21,  23,  24,  28,  29,  35,  42],  and  IEEE 
802.16  WiMAX  [34,  35,  37,  38,  48].  This  research  is  no  exception  and  the  RF-DNA 
process  is  adopted  here  to  assess  IEEE  802.15.4  ZigBee  Device  Classification.  However, 
there  has  been  a  recent  shift  in  AFIT  research  and  this  research  is  among  the  first  few  efforts 
to  consider  Device  ID  Verification  using  RF-DNA  fingerprints. 
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Figure  1.2:  AFITs  RF-DNA  Fingerprinting  Overview 
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1.3  Previous  vs.  Current  Research 


Table  1.1  provides  a  summary  of  technical  areas  that  have  been  previously  addressed 
and  areas  addressed  under  this  research. 


Table  1.1:  Technical  Areas  in  Previous  related  work  and  Current  research  contributions. 
The  x  symbol  denots  areas  addressed. 


Technical  Area  Previous  Work  Current  Research 


Addressed 

Ref# 

Addressed 

Ref# 

ID  Time  Domain  (TD) 

X 

[8,  17,  28,  29,  43,  47] 

[42,  43,  47,  48] 

X 

[11,12] 

ID  Spectral  Domain  (SD) 

X 

[38,  48] 

2D  Wavelet  Domain  (WD) 

X 

[28-30] 

2D  Gabor  (GT/GWT) 

X 

[21,34,35,37,38] 

Signal  Type 


802.11a  WiFi 

X 

[21,28-30,35,48] 

GSM  Cellular 

X 

[39,  40,  47] 

802. 16e  WiMax 

X 

[34,  35,  38,  48] 

802.15.4  ZigBee 

X 

[31] 

X 

[11,12] 

Classifier  Type 


MDA/ML 

X 

[28-30,  42,  43,  47,  48] 

[21,31,34,38-40] 

X 

[11,12] 

GRLVQI 

X 

[21,28,29,35,37] 

LFS 

X 

[4-6,  21-24] 

Dimensional  Reduction  Analysis  (DRA) 


GRLVQI 

X 

[28,  29,  33,  35,  37] 

X 

[12] 

LFS 

X 

[20,  21] 

KS-Test 

X 

[31] 

X 

[12] 

Device  ID  Verification 


Authorized  Device 

X 

[35,  37] 

X 

[12] 

Rogue  Device  Rejection 

X 

[35,  37] 

X 

[12] 
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1.4  Document  Organization 

The  remainder  of  this  document  is  organized  as  follows: 

•  Chapter  2  -  Background:  Provides  fundamental  information  on  ZigBee  IEEE 
802.15.4  signal  structure.  Describes  the  previously  established  procedure  for 
extracting  time-domain  features.  Explains  Multiple  Discriminant  Analysis 
(MDA)  model  development  and  Maximum  Likelihood  (ML)  Classification. 

•  Chapter  3  -  Research  Methodology:  Describes  the  specific  methodology  used 
in  this  research  to  implement  RF-DNA  fingerprinting  using  experimentally 
collected  ZigBee  emissions,  including  emission  collection  and  post-collection 
processing.  Describes  RF-DNA  fingerprint  quantitative  Dimensional  Reduc¬ 
tion  Analysis  (DRA)  methods,  including:  1)  pre-classification  KS-Test  p-value 
ranking,  and  2)  post-classification  GRLVQI  A,-  relevance  ranking.  Details  the 
methodology  used  to  perform  ZigBee  device  discrimination,  including  Device 
Classification,  Authorized  Device  ID  Verification ,  and  Rogue  Device  Rejection. 

•  Chapter  4  -  Results  and  Analysis:  Provides  results  and  performance  analysis 
for  full-dimensional  and  DRA  reduced  dimensional  RF-DNA  fingerprinting 
using  KS-Test  and  GRLVQI  selected  feature  sets.  Device  classification 
performance  for  full-dimensional  and  reduced  dimensional  feature  sets. 
This  includes  assessment  of  Device  Classification,  Authorized  Device  ID 
Verification,  and  Rogue  Device  Rejection  capability. 

•  Chapter  5  -  Summary  and  Conclusions:  Presents  a  summary  of  research 
activity,  significant  results,  and  recommendations  for  future  research. 
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II.  Background 


This  chapter  provides  the  technical  background  information  supporting  development 
of  the  methodology  described  in  Chap.  3  and  interpretation  of  results  presented 
in  Chap.  4.  Section  2.1  provides  details  for  ZigBee-based  networks  built  on  IEEE 
802.15.4  standard  for  wireless  low-data-rate  Wireless  Personal  Area  Networks  (WPAN). 
Section  2.2  explains  the  process  for  generating  RF-DNA  fingerprints  that  are  comprised 
of  statistical  features  extracted  from  time-domain  emission  responses.  A  description  of 
Multiple  Discriminant  Analysis  (MDA)  model  development  and  Maximum  Likelihood 
(ML)  classification  processes  are  described  in  Sections  2.3  and  2.4,  respectively,  and  are 
the  foundation  for  MDA/ML  processing  used  in  developing  Chap.  3  methodology. 

2.1  ZigBee  Signal  Structure 

ZigBee  technology  is  used  for  WPANs  and  is  seen  in  many  applications  requiring 
a  low  data  rate,  long  battery  life,  and  low  cost  solution.  These  applications  include 
home  automation,  industrial  control  and  monitoring,  remote  sensing/metering,  medical 
equipment  and  patient  monitoring,  asset  tracking  systems,  security  systems,  lighting  and 
temperature  control,  etc.  ZigBee-based  networks  are  built  on  the  WPAN  IEEE  802.15.4 
standard  which  defines  the  Physical  (PHY)  and  Media  Access  Control  (MAC)  layer 
structure.  The  ZigBee  specification  [51]  defines  the  Network  (NWK)  layer  specifications 
and  provides  a  framework  for  application  programming  in  the  Application  (APL)  layer. 

Figure  2.1  shows  the  MAC  frame  format  and  PHY  layer  structure  used  by  ZigBee  [26] . 
As  described  in  the  2.4  GHz  IEEE  802.15.4  standard,  the  PHY  Protocol  Data  Unit  (PPDU) 
packet  structure  consists  of  1)  a  Synchronization  Header  (SHR)  response  which  allows  a 
receiving  device  to  synchronize  and  lock  onto  the  bit  stream,  2)  a  PHY  Header  (PHR) 
response  which  contains  frame  length  information,  and  3)  a  variable  length  payload  which 
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carries  the  MAC  sublayer  frame.  The  SHR  region  in  Fig.  2.2  is  comprised  of  a  32-bit 
preamble  and  an  8-bit  Start-of-Frame  Delimiter  (SFD)  sequence.  The  preamble  sequence 
is  designed  for  acquisition  of  symbol  chip  timing  and  is  composed  of  a  32-bit  binary 
zero  string.  The  SFD  region  is  used  to  signify  the  end  of  preamble  and  consists  of  a 
predefined  8-bit  sequence  of[l  1  100101].  Information  contained  with  the  SHR  region 
remains  constant  and  is  independent  of  device  emissions,  individual  device  types,  device 
applications,  etc.  Early  research  reported  in  [11]  exploited  the  preamble-only  region  of 
ZigBee  emissions  for  RF-DNA  fingerprinting.  Subsequent  analysis  revealed  a  greater  level 
of  device  discrimination  can  be  realized  using  the  entire  SHR  region  (preamble  and  SFD). 
Thus,  the  methodology  described  in  Chap.  3  and  results  in  Chap.  4  are  based  exclusively 
on  RF-DNA  exracted  from  the  SHR  region. 


Octets:  2  1  4  or  10  2  km  n  2 
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Figure  2.1:  Data  frame  PHY  and  MAC  layer  structures  for  a  ZigBee  packet  [26]. 


2.2  Time-Domain  RF-DNA  Fingerprint  Generation 

The  RF-DNA  fingerprints  for  an  emission  Time  Domain  (TD)  response  are  derived 
from  its  instantaneous  amplitude  (a),  phase  (0)  and  frequency  (/)  responses,  as  described 
in  [11,  12,  30,  33,  39,  40,  43,  48].  The  corresponding  characteristic  sequences,  having 
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Octets:  4 
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Figure  2.2:  PHY  Protocol  Data  Unit  (PPDU)  packet  structure  for  IEEE  802.15.4  [26]. 


elements  denoted  by  a[n\,  (p[n\,  and  f\n\,  are  generated  using  Ns  complex  I-Q  signal 
samples  s[n]=S/[n]+jsQ[n]  from  the  specific  Region  Of  Interest  (ROI)  in  the  collected 
signal  where  the  mean  value  is  removed  (centered)  and  then  normalized  (division  by 
maximum  value)  [30,  43].  Elements  of  the  emission  TD  response  are  calculated  by, 

a[n\  =  ^js2j[n\  +  s2Q[n\,  (2.1) 

0[/?]  =  tan  1  Q -  ^  ,  for  S/[n]  t  0,  (2.2) 

si[n] 


fin)  = 


1  [  d(p{n) 


J  '  '  2n  [  dt  \ 

Mean  removal  and  normalization  for  each  of  the  Ns  elements  in  characteristic 
sequences,  {a[i ?]},  {<-/>[/?]},  and  {/[/?]},  is  achieved  using, 

a[n\  -  fia 

a dn)  = - ,  -  ,  (2.4) 

max[ac[n]} 

n 

Iri  <f>[n\-M+ 

<Pc\n\  = -  ,  (2.5) 

ma  x{(f)c[n]} 

n 

f[n ]  -  Hf 

f  in]  =  JTTix  ’  (2'6) 

max[/c[n]} 

n 

where  n  =  1,2,3, .. .  ,  Ns ,  and  fja.  ji(b  and  are  the  means  of  {a[n]},  !</>(/?]},  and  {/[«]} 
calculated  across  Ns  samples,  and  max]-}  denotes  the  maximum  value  of  each  feature 
sequence’s  centered  magnitude. 

RF-DNA  fingerprints  are  compromised  of  statistical  features  extracted  from  instanta¬ 
neous  TD  responses  over  a  specific  ROI  in  the  collected  signal  [11,  12,  30,  33,  39,  40,  43, 
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Arbitrary  Feature  Sequence 


1 H  i  [  ®R3’  C~R3’  Yr3 ’  KR3  1 


Figure  2.3:  Representative  illustration  of  regional  fingerprint  marker  generation  for  an 
arbitrary  ROI  sequence  using  NR+1  total  subregions  and  NM=4  statistical  metrics  [33]. 


48].  The  selected  ROI  is  a  response  region  that  is  1)  ideally  consistent  across  all  collected 
signals,  and  2)  independent  of  data  modulation  and  device  ID  information.  As  shown  in 
Fig.  2.3,  statistical  RF-DNA  features  of  standard  deviation  (cr),  variance  (cr2),  skewness 
(y),  and  kurtosis  ( k )  are  calculated  over  the  ROI  to  form  regional  fingerprint  markers  gen¬ 
erated  by:  1)  dividing  each  selected  characteristic  sequence  («[/7]  },{0[/?] },  and  ]/[»]}  into 
Nr  contiguous,  equal  length  subsequences  such  that  Ns  /NR  is  an  integer,  2)  calculating  NM 
metrics  for  each  subsequence,  plus  the  entire  fingerprinted  region  as  a  whole  (NR+ 1  total 
regions),  and  3)  arranging  the  metrics  in  a  vector  of  the  form, 

FRi  =  [cr R.  o \  yRi  KRj ]  i X4  ,  (2.7) 


where  i  =  1, 2, . . . ,  NR  +  1 .  The  NM  metrics  for  each  subsequence  are  calculated  from, 


1  N 

^  =  xln] 
n=l 


(2.8) 


r=  A 

\  n=l 

o-2  =  ^ 2  ’ 


n=  1 


(2.9) 

(2.10) 
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(2.11) 


1  N 

n—  1 

K=J^Tj(x[n]-V)4,  (2-12) 

n=  1 

where  a[h]  is  the  ////;  feature  vector  element  and  N  is  the  total  number  of  samples  in  each 
subsequence  used  to  calculate  the  statistic. 

The  marker  vectors  from  (2.7)  are  concatenated  to  form  the  composite  characteristic 
vector  for  each  characteristic  and  are  given  by, 

F  =  [FRl:FR2:FR} . . .  FRnr+i]ix[NmX{Nr+1)]  (2.13) 

If  only  one  signal  characteristic  is  used  (a,  0,  or  /),  the  expression  in  (2.13)  represents  the 
final  classification  fingerprint.  When  all  Nc  =  3  signal  characteristics  are  used,  the  final  RF 
fingerprint  is  generated  by  concatenating  vectors  from  (2.13)  according  to 

F  =  [F°  :  F^  :  F^]1x[Wmxwr+i)xvc]  (2.14) 

The  final  full-dimensional  RF  fingerprint  (2.14)  is  a  vector  comprised  of  NF  features,  where 

Nf  =  Nm  X  (Nr  +  1)  x  Nc  (2.15) 

2.3  Multiple  Discriminant  Analysis  (MDA) 

The  research  methodology  presented  in  Chap.  3  is  based  on  fundamental  MDA 
concepts  described  in  this  section.  MDA  is  a  linear  method  of  projecting  high-dimensional 
data  into  a  lower-dimensional  space  that  best  separates  data  in  a  least-squares  sense  [13]. 
MDA  is  performed  on  RF-DNA  fingerprints  to  reduce  the  feature  dimensionality  and  aid 
in  the  development  of  a  class  (device)  specific  model  as  described  in  3.5.1. 

MDA  is  an  extension  of  Fisher’s  Linear  Discriminant  process  when  discrimination  of 
two  or  more  classes  is  required  (Nc> 2).  MDA  reduces  input  feature  dimensionality  by 
projecting  Nf -dimensional  input  features  into  a  (Ac-1  )-dimcnsional  subspace,  where  it 
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is  assumed  that  NF>NC-  This  linear  transformation  (projection)  is  performed  with  a  goal 
toward  maximizing  the  out-of-class  separation  (class  mean  differences)  and  minimizing 
within-class  spread  (variance  within  each  class)  of  input  data  projections  [13]. 

The  out-of-class  (inter-class,  Sh)  and  within-class  (intra-class,  Sw)  scatter  matrices  in 
MDA  are  computed  as  [44], 

Nc 

=  p^  ’ 
i=  1 

Nc 

S =  Y  p^i  -  Po )(Ui  -  Hof  , 

1=1 

with  class  covariance  (£,■)  and  global  mean  (jjlq)  calculated  as  follows, 

=  £[(*-//,-)(* -#)r],  (2.18) 

Nc 

Ho  =  Y  Ppl  ’  ^2-19^ 

i=  1 

where  /z,  is  the  mean  and  P,  is  the  prior  probability  of  each  Nc  class.  The  within-class 
scatter  matrix  in  (2.17)  provides  a  measure  of  probability-weighted  class  feature  variance 
and  the  out-of-class  scatter  matrix  in  (2.16)  provides  a  measure  of  the  average  (over  all 
classes)  distance  between  individual  class  means  from  the  respective  global  mean. 

The  iV/:-dimcnsional  input  RF-DNA  fingerprint  vectors,  F  from  (2.13),  are  projected 
into  the  lower  (Nc~  1  (-dimensional  subspace  using, 

f  =  WrF  ,  (2.20) 

where  W  is  the  NFx(Nc~  1)  transformation  (projection)  matrix  formed  from  the  Ac-1 
eigenvectors  of  S^,1  S/,  and  f  is  the  projected  RF-DNA  fingerprint.  This  linear  projection 
by  matrix  W  results  in  the  optimal  ratio  between  inter-class  distances  and  intra¬ 
class  variances  [44].  Figure  2.4  shows  two  possible  representative  MDA  projection 


(2.16) 

(2.17) 
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transformations  (Wi  and  W2)  for  Nc=3  classes  onto  a  2-dimensional  subspace;  for  this 
illustration  Wi  provides  the  “best”  class  separation. 


Figure  2.4:  Representative  projections  for  /Vc=3  classes  projected  onto  2-dimensional 
subspaces  using  Wi  and  W2  [13];  Wi  is  more  optimal  in  this  case. 


2.4  Maximum  Likelihood  (ML)  Classification 

This  section  describes  the  ML  classification  process  used  in  the  research  methodology 
described  in  Chapter  3.  When  considering  Nc>2  classes  comprised  of  ^-dimensional 
input  features,  ML  classification  can  be  performed  using  an  MDA-based  model  described 
in  Sect.  2.3;  the  “model”  consists  of  projection  matrix  W.  The  available  input  data  set  for 
each  of  the  Nc  classes  is  divided  into  Training  and  Testing  data  sets,  with  the  Training  set 
used  for  MDA  model  development  per  Sect.  2.3  and  Testing  set  used  for  ML  classification. 

For  ML  classification,  the  MDA  model  (W)  is  first  used  to  project  the  Training  set  for 
all  Nc  classes  into  the  Fisher  space.  Class  specific  projected  means  (//,•)  and  covariances 
(%)  are  then  computed  for  i=  1,2,...  ,  NC-  The  projected  data  is  assumed  to  be  multivariate 
Gaussian  distributed  with  class-dependent  means  of  //,  and  class-dependent  covariances  of 
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%.  Alternately,  identical  covariances  can  be  assumed  and  a  pooled  covariance  estimate  tP 


used  for  all  classes: 


NC 

Ncit 


(2.21) 


The  assumed  MVG  distributions  effectively  represent  posterior  conditional  probabil¬ 
ities  that  can  be  used  to  measure  class  likelihood  for  projected  Testing  fingerprint  f.  For  a 
pooled  covariance  estimate,  likelihood  estimation  can  be  implemented  as  [33,  44], 


P(t\Nc,)  = 


1 


(27r)wc-i)/2  det  (xp) 


1/2 


expCQ , 


(2.22) 


where, 


(2.23) 


Class  likelihood  values  are  used  for  ML  classification  based  on  Bayesian  decision  theory 
by  assigning  a  class  label  to  subsequent  Testing  data.  In  the  case  of  Nc  classes,  a  given 
projected  Testing  fingerprint  f  is  assigned  to  class  c,  according  to, 

p(c,|f)>P(cy|f)  V/*/,  (2.24) 


where  i=  1,2,...  ,  Nc  and  P  (c,|f)  is  the  conditional  posterior  probability  that  f  belongs  to 
class  Cj.  The  conditional  posterior  probability  P  is  found  by  applying  Bayes’  Rule  and 
using  class  likelihood  values  as  shown  [33,  44]: 


P 


p(f) 


(2.25) 
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where  prior  probabilities  are  assumed  equal  for  all  classes  ( P(ci)=l/Nc )  and  thus  can  be 
neglected  when  making  (2.24)  comparison.  Since  (2.25)  is  applied  for  a  given  projected  f 
fingerprint,  P  (f)  remains  constant  across  all  c,-  and  can  also  be  neglected  as  well.  Using  the 
decision  criteria  from  (2.24),  projected  “testing”  fingerprints  f  are  assigned  a  class  label 
Cj  based  on  maximum  posterior  probability,  with  correct  classification  occurring  when  the 
assigned  class  label  matches  the  true  class  label.  This  ML  classification  process  is  used  in 
the  research  methodology  to  perform  device  classification  as  described  3.5.2. 
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III.  Research  Methodology 


This  chapter  provides  the  methodology  used  to  conduct  this  research  and  obtain 
results  presented  in  Chap.  4.  Topics  are  presented  sequentially  relative  to  the 
RF-DNA  processing  overview  shown  in  Fig.  3.1.  This  process  begins  with  ZigBee 
device  signal  collections  made  in  three  different  environment  scenarios  as  described 
in  Section  3.1.  Section  3.2  explains  the  post-processing  procedure  that  is  performed 
on  collected  emissions  prior  to  RF-DNA  fingerprint  generation.  Section  3.3  provides 
specifics  on  how  ZigBee  time-domain  features  are  used  to  generate  RF-DNA  fingerprints. 
Dimensional  Reduction  Analysis  (DRA)  and  two  quantitative  selection  methods  1)  pre¬ 
classification  Kolmogorov-Smimov  (KS)-Test  p- value  ranking  and  2)  post-classification 
Generalized  Relevance  Learning  Vector  Quantization-Improved  (GRLVQI)  A;  relevance 
ranking  are  introduced  in  Section  3.4.  As  explained  in  Section  3.5,  RF-DNA  fingerprints 
were  input  to  a  Multiple  Discriminant  Analysis  (MDA)  process  and  the  resultant  model 
used  for  both  Maximum  Likelihood  (ML)  Device  Classification  (Section  3.5.2)  and  device 
ID  verification,  specifically  Authorized  Device  ID  Verification  (Section  3.5.3. 1)  and  Rogue 
Device  Rejection  (Section  3. 5. 3. 2). 

3.1  Signal  Collection 

An  Agilent  E3238S  [2]  receiver  (Rx)  was  used  to  collect  emissions  from  ten  Texas 
Instruments  (TI)  CC2420  2.4  GHz  IEEE  802.15.4  ZigBee  devices  (denoted  herein  as 
Devi,  Dev2,  ...,  DevlO).  The  Agilent  Rx  can  collect  signals  at  an  Radio  Frequency  (RF) 
center  frequency  spanning  fc= 20.0  MHz  to  fc= 6.0  GHz  using  a  tunable  RF  filter  with 
an  instantaneous  bandwidth  of  WRF =36.0  MHz.  The  selected  frequency  band  is  down- 
converted  to  an  Intermediate  Frequency  (IF)  of  fip= 70  MHz  and  digitized  by  an  Nb= 12  bit 
Analog-to-Digital  Converter  (ADC)  operating  at  a  sampling  rate  of  fs= 95  Mega-Samples- 
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Figure  3.1:  Overview  of  AFIT’s  RF-DNA  Fingerprinting  Process  [36]. 


per-second  (MSps),  digitally  down-converted  to  near  baseband,  baseband  filtered  with  a 
specific  (user  defined)  bandwidth  WBB,  and  automatically  sub-sampled  at  a  rate  based  on 
(Wiiii)  in  accordance  with  Nyquist  criteria  requirements.  All  resultant  collected  samples  are 
stored  as  complex  In-phase  and  Quadrature  (I-Q)  data  in  a  .cap  file  format  [3]. 

Prior  to  device  signal  collections  all  CC2420  radio  transceivers  were  programmed  to 
transmit  2.4  GHz  IEEE  802.15.4  compliant  packets  (bursts,  pulses,  etc.)  with  an  arbitrary 
payload  at  a  rate  of  14  transmissions-per-second.  The  arbitrary  payload  is  irrelevant 
to  this  research  because  RF-DNA  fingerprints  are  generated  from  the  Synchronization 
Header  (SHR)  region  within  the  transmitted  bursts.  For  each  transmitting  (Tx)  CC2420 
device,  a  total  of  NB=1000  burst  responses  were  collected  under  three  operating  conditions, 
including:  1)  both  the  Tx  and  RX  antenna  inside  a  Ramsey  STE3000B  RF  shielded 
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anechoic  chamber  (“CAGE”)  as  done  in  [11,  31],  2)  the  Tx  and  Rx  having  a  clear  Line-of- 
Sight  (“LOS”)  path  down  a  hallway-location  A  in  Fig.  3.2  [12]) ,  and  3)  the  Tx  and  Rx  on 
opposite  sides  of  a  wall  (“WALL”)-location  B  shown  in  Fig.  3.2  [12]. 

During  “Cage”  collections  the  Tx  position  was  consistently  maintained  at  20  cm 
from  a  dipole  antenna  in  an  RF-absorbent  Ramsey  STE3000B  test  enclosure  that  was 
connected  to  the  Agilent  Rx  input  by  a  shielded  cable.  For  the  experimental  “LOS” 
collections  (location  A),  the  devices  under  test  (Tx)  were  placed  5.0  m  from  a  stationary 
6  dB  gain  Ramsey  LPY2  log  periodic  antenna  [32]  attached  to  the  Rx.  For  “WALL” 
collections  (location  B),  the  devices  (Tx)  were  placed  behind  an  interior  wall  (5.5  m  from 
Rx)  consisting  of  1.6  cm-thick  drywall  separated  by  9.2  cm  steel  studs  spaced  40.6  cm 
on  center,  for  a  total  thickness  of  12.4  cm,  where  fiberglass  sound  batting  fills  inter-stud 
spaces.  For  both  “LOS”  and  “WALL”  collection  locations  the  log  periodic  antenna  was 
aligned  with  the  main  beam  pointing  down  an  office  environment  hallway  at  the  collection 
device  locations  shown  in  Fig.  3.2.  The  collected  Signal  to  Noise  Ratio  (SNR)  over  the 
Region  Of  Interest  (ROI)  was  found  to  be  SNRC~ 50,  40,  30  dB  for  “CAGE”,  “LOS”,  and 
“WALL”  locations,  respectively. 


Figure  3.2:  Operational  indoor  collection  geometry  showing  collection  receiver  antenna 
pattern  and  ZigBee  device  (A)  “LOS”  and  (B)  “WALL”  experimental  collection  locations. 
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3.2  Post-Collection  Processing 

The  post-collection  processing  here  was  performed  similarly  to  the  methodology  used 
in  [12,  31].  The  Agilent  receiver  collection  files  (.cap  format)  were  converted  for  use  with 
MATLAB®  (.mat  format)  and  post-collection  processed  by  1)  detecting  individual  bursts 
using  an  amplitude-based  threshold  detection  process,  2)  removing  detected  bursts  from  the 
collection  file,  3)  down-converting  individual  bursts  and  applying  baseband  digital  filtering, 
and  4)  power  scaling  noise  to  achieve  the  desired  SNR  and  model  the  effects  of  differing 
channel  conditions.  The  Additive  White  Gaussian  Noise  (AWGN)  was  digitally  filtered  the 
same  as  collected  bursts  and  power-scaled  to  achieve  the  desired  SAR=[0-30]  dB.  Given 
the  high  collection  S  NRC  over  the  ROI,  the  like-filtered  AWGN  was  added  directly  to  the 
collected  IQ  data  and  was  the  dominant  noise  source. 

3.2.1  Burst  Detection. 

The  CC2420  devices  were  programmed  to  transmit  bursts  at  a  rate  of  approximately  14 
bursts-per-second  (1  burst  every  69  ms)  and  transmissions  were  collected  from  one  device 
at  a  time.  The  Aglient  receiver  stored  the  collected  transmissions  in  a  .cap  file  format 
which  was  converted  to  a  .mat  file  for  use  in  MATLAB®.  Detection  and  extraction  of  burst 
responses  were  found  using  a  amplitude-based  threshold  detection  process  with  specific 
parameters  including:  termination  threshold  (tT),  detection  threshold  (tD),  minimum  burst 
length  ( Pmin ),  and  maximum  burst  length  (Pmax)-  The  instantaneous  amplitude  response 
( a[n] )  of  collected  ZigBee  bursts  was  calculated  using  (2.1)  and  converted  to  dB  using, 

a[n]dB  =  201og10^.  (3.1) 

The  result  of  (3.1)  is  illustrated  in  Fig.  3.3  for  a  collection  containing  NB= 4  bursts 
and  a  typical  burst  detection  termination  theshold  (It).  Burst  detection  begins  by  finding 
the  global  peak  amplitude  response  CG=max{|a[n]|]  Vh  in  a  given  (.mat)  collection  file. 
Detection  threshold  tD  is  then  applied  as  shown  in  Fig.  3.4  to  determine  the  leading  and 
trailing  edges  of  a  declared  burst,  these  edges  correspond  to  leading/trailing  edge  sample 
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indices  («/,  nt)  within  a[ii\  at  which  |a[/?]|«CG  -  tD  occurs.  The  estimated  burst  duration 
(n,  -  ni)  is  calculated  and  compared  to  Pmin  and  Pmax  to  determine  if  the  declared  burst 
meets  the  estimated  ZigBee  pulse  width,  PMiN<(nt  -  n\)<PMAx-  If  the  declared  burst  meets 
all  requirements,  it  becomes  a  detected  burst  and  is  extracted  (removed  from  the  collection 
file);  else,  the  declared  burst  is  discarded.  This  iterative  process  continues  by  finding  the 
next  maximum  peak  amplitude  value  CMAA:=max{ l«Mlh  estimating  burst  duration,  and 
so  on.  The  detection  process  is  terminated  when  either  1)  the  desired  number  of  bursts  are 
detected,  or  2)  the  condition  CMAx<Cc-tT  occurs  for  a  declared  burst  indicating  max{|tf[/?]|} 
is  below  the  pre-established  termination  threshold,  tT.  The  specific  values  used  for  ZigBee 
burst  detection  are  provided  in  Table  3.1. 


Table  3.1:  Burst  detection  parameters  for  ZigBee  transmission  collections. 


Parameter 

Variable 

Value 

Termination  Threshold 

tr 

6.0  dB 

Detection  Threshold 

0) 

9.0  dB 

Pulse  Min  Duration 

Pmin 

850  /usee 

Pulse  Max  Duration 

Pmax 

870  /usee 

20 


-60 


Time  (seconds) 


Figure  3.3:  Representative  ZigBee  collection  showing  Nri=4  bursts  and  a  typical  processing 
termination  threshold  (tj)- 


57.6  57.7  57.8  57.9  58.0  58.1  58.2  58.3  58.4  58.5  58.6 

Time  (ms) 

Figure  3.4:  Representative  detected  ZigBee  burst  and  typical  detection  threshold  (tD). 
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3.2.2  Digital  Filtering. 

The  detected  bursts  are  down-converted  to  baseband  (/= 0)  using  a  Power  Spectral 
Density  (PSD)  average  estimated  center  frequency  fc  for  the  16  possible  channels  spanning 
2.4  Ghz  to  2.4835  GHz  [26].  The  down-conversion  frequency  (foe)  is  estimated  channel- 
by-channel  such  that  bursts  within  estimated  channels  are  all  down-converted  by  the  same 
estimated  channel  frequency.  The  down-converted  signal  is  then  digitally  filtered  using 
a  8th-order  Butterworth  baseband  filter  having  a  -3  dB  bandwidth  of  Wrili=  1 .0  MHz. 
Figure  3.5  shows  the  PSD  of  a  ZigBee  baseband  emission  overlaid  with  the  impulse 
response  of  the  Butterworth  baseband  filter. 


Frequency  (MHz) 

Figure  3.5:  Representative  ZigBee  burst  PSD  response  overlaid  with  an  8f/,-order 
Butterworth  digital  filter  impulse  response. 


3.2.3  Signal-to-Noise  Ratio  Scaling. 

The  high  collected  SNRC  over  the  ROI  allows  for  the  addition  of  power-scaled, 
like-filtered  AWGN  to  generate  analysis  signals  with  SNRa&[ 0  30]  dB.  These  analysis 
signals  allow  for  classification  and  verification  performance  assessment  under  varying 
channel  conditions.  Using  the  analytic  expression  for  an  arbitrary  complex  sequence  ]x(f)}, 
i=l,  2, . . .  ,  K,  the  estimated  average  power  in  X  is  given  by, 
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(3.2) 


1  K 

X  =  —  V  x(i)x  (i) , 

Kvt 


where  x*(i)  is  the  complex  conjugate  of  x(i).  The  collected  ZigBee  signals  are  complex  and 
consist  of  two  components, 

sc(i)  =  s,(i)  +  nb(i) ,  (3.3) 


where  s,(i)  and  nb(i)  are  the  collected  transmitted  signal  and  collected  background  noise, 
respectively.  The  total  power  in  sc  can  be  calculated  as, 


S  c  -  S  t  +  Nb  , 


(3.4) 


where  S  c  was  measured  over  the  ROI  and  Nb  was  measured  when  no  signal  was  present 
using  (3.2)  given  by, 


1  K 

Sc  =  —  ^  sc(i)s*(i) 

i=  1 


(3.5) 


i  K 
1  V" 


Nb  =  K  Zj  nb^nb^ 

i—  1 


(3.6) 


Rearranging  (3.4)  the  transmitted  signal  power  5,  is  calculated  and  the  estimated  collected 
S  NR  in  dB  over  the  ROI  is  given  by, 


SNRf 


10  x  log10 


(3.7) 


which  yielded  SNRC~50,  40,  30  dB  over  the  ROI  region  for  “CAGE”,  “LOS”,  and 
“WALL”  locations  collections,  respectively. 

The  desired  scaled  analysis  signal  SaH)  is  generated  by  adding  zero-mean,  like-filtered, 
independent  AWGN  samples  according  to, 


sa  O')  =  st(i)  +  nb(i)  +  nG(i) , 


(3.8) 


where  the  average  power  in  {nG(i)}  is  scaled  to  achieve  a  desired  range  of  SNRA. 
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A  complex,  zero-mean,  normally  distributed  random  sequence  with  an  estimated 
average  power  of  1  (NG= 1)  produces  the  AWGN  samples.  This  complex  sequence  was 
digitally  filtered  by  the  same  Butterworth  filter  used  for  the  collected  signal  to  produce 
like-filtered  AWGN  samples.  The  sequence  is  then  power-scaled  by  Rn  to  achieve  the 
desired  SNRA,  with  Rn  calculated  using, 

V-SNRa 

10tx5„ 

which  results  in  a  total  average  AWGN  power  N(,  given  by, 

1  K 

N(_;  =  —  ^  R^AWCNihR^AWG^i) 

i=  1 

The  corresponding  analysis  S  NRA  is  then, 

SA<=10x,og10(^_).  (3.11) 

For  general  collection  conditions  the  scaled  AWGN  power  is  generally  much  greater  than 
the  collected  background  noise  power  (NG»Nh)  and  (3.11)  reduces  to, 

SiWf  *  10  xlogjAj.  (3.12) 

3.3  RF  Fingerprint  Generation 

This  section  provides  details  on  statistical  time-domain  RF-DNA  fingerprint  genera¬ 
tion  as  introduced  in  Section  2.2.  For  this  research,  the  ZigBee  SHR  region  was  selected  as 
the  ROI  given  that  it  1)  was  experimentally  observed  within  all  bursts  collected  from  all  de¬ 
vices  and  2)  is  independent  of  MAC  frame  information  and  payload  data.  The  SHR  region 
(40  bits  total  length)  is  comprised  of  a  preamble  sequence  (32  bits  in  length)  and  the  Start- 
of-Frame  Delimiter  (SFD)  (8  bits  in  length)  and  consisted  of  1920  collected  time  samples. 


(3.9) 


(3.10) 
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For  this  research  the  SHR  time-domain  signals  were  broken  down  into  NR=80  subregions  (2 
subregions  for  each  bit)  where  24  time  samples  were  contained  in  each  subregion.  Nr-80 
subregions  was  chosen  because  it  showed  improved  device  discrimination  performance 
when  compared  to  NR=40  subregions  (1  subregion  for  each  bit).  Full-dimensional  RF-DNA 
fingerprints  were  generated  using  (2.7)  through  (2.14)  based  on  Nc= 3  signal  characteristics 
(a,  (p,  f)  and  NM=3  statistic  metrics  (cr2,  y,  a:),  for  a  total  of  NFuu=NMx(NR  +  l)xiVc=729 
features  per  RF-DNA  fingerprint.  For  this  research  the  standard  deviation  statistic  metric 
was  omitted  due  to  its  close  relation  to  variance.  Figure  3.6  shows  a  representative  time 
domain  response  for  a  ZigBee  SHR  region.  The  experimentally  observed  SHR  duration  of 
Ts//r~1  60  fis  is  consistent  with  the  IEEE  802.15.4  specification  [26]. 


Figure  3.6:  Representative  ZigBee  SHR  response  used  as  the  region  of  interest  for  RF-DNA 
fingerprint  generation. 


3.4  Dimensional  Reduction  Analysis  (DRA) 

The  Fisher-based  MDA  process  in  Section  2.3  inherently  masks  feature  contribution  to 
resultant  classification  performance  and  it  is  impossible  to  determine  which  features  have 
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the  greatest  impact.  The  goal  of  Dimensional  Reduction  Analysis  (DRA)  is  to  minimize 
the  number  of  RF  fingerprint  features  (NF)  while  achieving  a  certain  classification  accuracy. 
One  approach  to  minimize  the  number  of  features  (dimensions)  is  to  use  the  features  that 
provide  the  most  significant  contribution  to  classification  while  removing  less  relevant 
features.  Insight  into  feature  relevance  is  addressed  here  quantitatively  using:  1)  a  pre¬ 
classification  KS-Test  goodness-of-fit  test  [12,  31],  and  2)  a  post-classification  feature 
relevance  ranking  provided  by  GRLVQI  processing  [33,  36]. 

The  KS-Test  goodness-of-fit  selection  process  includes  [12,  31]: 

1. )  Generating  a  full-dimensional  (NF )  feature  set  using  (2.14)  for  NSHR  responses 

at  a  specific  SNR  from  each  of  the  ND  devices  to  be  classified. 

2. )  Conducting  NPW=[(ND  -  l)ND]/2  pairwise  two-sample  KS-tests  using  the  NF 

dimensional  feature  sets  between  every  two  devices  under  test,  and  forming  a 
matrix  of  resultant  p- values  with  dimension  NPWxNF. 

3. )  Summing  each  feature’s  p-values  across  pairwise  combinations  and  rank¬ 

ordering  the  summed  p- values  from  lowest-to-highest  while  tracking  feature 
index  number. 

4. )  Determining  a  summed  p-value  cutoff  threshold,  or  arbitrarily  setting  a  most 

relevant  feature  length  /,  to  decide  which  features  are  retained  for  classification. 

The  quantitative  pre-classification  feature  reduction  process  can  be  used  to  identify 
and  select  a  most  relevant,  length  /,  subset  of  the  full-dimensional  RF-DNA  feature  set  F 
prior  to  Multiple  Discriminate  Analysis,  Maximum  Likelihood  (MDA/ML)  classification. 
The  KS-Test  is  a  suitable  option  for  analyzing  statistical  features  differences  and  is 
used  here  to  quantify  differences  in  Cumulative  Distribution  Functions  (CDF)  between 
full-dimensional  RF-DNA  features  from  two  devices.  KS-Test  results  in  Section  4.3 
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are  presented  as  summed  p- values  from  all  pairwise  combinations  of  the  ND  devices 
considered,  where  lower  p- values  indicate  a  more  significant  data  set  difference  [31]. 

The  second  alternative  to  feature  selection  is  based  on  GRLVQI  processing  which 
inherently  provides  an  indication  of  feature  relevance  following  model  development.  The 
process  here  was  adopted  entirely  from  previous  demonstrations  showing  that  GRLVQI 
is  a  powerful  tool  for  performing  device  classification  and  DRA  [33,  38].  The  GRLVQI 
process  provides  a  relevance  indicator  (T,  value)  for  each  feature  comprising  the  RF-DNA 
fingerprint  at  a  specified  SNR.  The  relevance  value  provides  a  measure  of  contribution 
to  class  (device)  separation  within  the  GRLVQI  classification  process.  The  higher  the 
relevance  value,  the  greater  the  impact  on  class  separation.  Feature  DRA  is  achieved  rank¬ 
ordering  Aj  values  and  selecting  the  top-ranked,  arbitrary  length  /,  features  from  the  full¬ 
dimensional  feature  set. 

3.5  Device  Discrimination  Process 

Statistical  RF-DNA  fingerprints  for  ZigBee  device  SHR  responses  are  used  as  inputs 
into  a  device  discrimination  process.  Figure  3.7  shows  a  block  diagram  for  the  device 
discrimination  process  used  in  this  research.  This  process  begins  with  separating  collected 
RF  fingerprints  into  “ Training ”  and  “Testing”  sets,  where  the  “Training”  fingerprints  are 
used  for  Multiple  Discriminant  Analysis  (MDA)  model  development.  Once  a  model  is 
developed,  “Testing”  fingerprints  are  projected  into  the  mapped  feature  space  and  used  for 
either  1)  Device  Classification  (a  1  vs.  ND  “Looks  most  like?”  assessment)  or  2)  Device  ID 
Verification  (a  1  vs.  1  “Looks  how  much  like?”  assessment). 

3.5.1  MDA  Model  Development. 

As  introduced  in  Section  2.3,  MDA  can  be  applied  when  discrimination  of  two  or 
more  classes  (devices)  is  required  (Nd> 2).  For  results  presented  in  Chapter  4,  MDA  model 
development  is  performed  using  a  pool  of  RF-DNA  fingerprints  from  ND= 4  ZigBee  devices 
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Figure  3.7:  Block  diagram  of  device  discrimination  process  supporting  both  classification 
and  verification  using  selected  measures  of  similarity  and  test  statistics. 


(Devi,  Dev2,  Dev3,  and  Dev4)  constructed  as  a  “hybrid”  data  set  of  fingerprints  from  the 
“CAGE”,  “LOS”,  and  “WALL”  collection  scenarios;  the  result  is  referred  to  as  a  ’’hybrid” 
MDA  model  throughout  the  document.  During  model  development  MDA  reduces  input 
feature  dimensionality  by  projecting  Np  fingerprint  features  onto  a  (ND- 1  (-dimensional 
subspace.  The  MDA  projection  matrix  W'  is  developed  as  shown  in  Lig.  3.8  using  an 
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iterative  A'- fold  training  process  with  a  goal  toward  projecting  higher-dimensional  input 
fingerprint  F  data  into  a  lower  dimensional  subspace  such  that  inter-class  separation  is 
maximized  and  intra-class  spread  is  minimized  [13].  The  parenthetical  SNR  denotes  that 
the  W\SNR ),  jUi(SNR),  and  t,P(S  NR)  generally  varies  with  SNR,  requiring  MDA  models 
to  be  developed  for  each  SNR. 


Signal  Collection 

Agilent  E3238 

Post-Collection  Processing 
...Input  Data 

1 

K-Fold  MD 

A 

W’fSNR),  |1(SNR),  fe(SNR) 


Figure  3.8:  Signal  collection,  post-collection  and  /f-fold  MDA  model  development 
(training)  processes.  A  representative  2D  Fisher  space  is  shown  for  Nd= 3  ZigBee  devices 
operating  at  SNR=  10  dB.  Clustering  of  the  100  projected  training  fingerprints  (o)  per 
device  shown  relative  to  class  means  (•). 


For  all  results  presented  in  Chapter  4,  MDA  model  development  was  accomplished  by 
using  a  AMbld  cross-validation  training  process,  shown  in  Fig.  3.9,  where  values  of  K= 5 
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and  A"=10  are  commonly  used  and  provide  sufficient  statistical  certainty  [25];  a  value  of 
K- 5  was  used  here.  The  IGfold  training  process  consists  of: 

1.  Randomly  Parsing  “Training”  fingerprints  into  K  blocks. 

2.  Separating  K  blocks  such  that  K- 1  blocks  are  used  for  training  and  one  block  is 
retained  for  model  validation. 

3.  Performing  MDA  transformation  on  K- 1  blocks  using  projection  matrix  W#,  as 
described  in  Section  2.3. 

4.  Computing  training  class  (device)  means  (/),)  and  pooled  covariances  (Sp)  to  be  used 
for  Multivariate  Gaussian  (MVG)  distributed  models,  as  described  in  Section  2.4. 

5.  Tracking  fold  ML  classification  performance  ( %Ck )  using  the  retained  validation 
block,  as  described  in  Section  2.4. 

6.  Repeating  steps  2-5  such  that  a  different  block  is  retained  for  validation  until  K 
iterations  are  completed. 

7.  Determining  the  and  corresponding  pi,  and  t,P  that  achieved  maximum  (Best) 
classification  performance  (highest  %CK). 

3.5.2  Device  Classification. 

Once  MDA  model  development  is  accomplished,  device  classification  is  performed 
using  a  Maximum  Likelihood  (ML)  classifier  as  described  in  Section  2.4,  with  input 
“Testing”  fingerprints  classified  as  being  affiliated  with  one  of  ND=4  possible  devices. 
For  ML  classification,  the  prior  probabilities  are  assumed  to  be  equal,  the  costs  uniform, 
and  the  device  likelihoods  have  a  MVG  distribution  with  means  (pup  and  covariances  (tP) 
as  computed  during  MDA  model  development.  The  ML  classification  process  consists  of: 
1)  inputting  a  “Testing”  fingerprint  F;  for  a  collected  emission  from  an  unknown  device 
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M  DA/ML  Mod  el  Development  Today's  Uata 


Figure  3.9:  Illustration  of  A'-fold  cross-validation  training  process  used  for  MDA  model 
development.  The  “best”  model  WB  is  selected  as  the  WK  yielding  maximum  %CK- 
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Dj,  2)  projecting  F;  into  the  Fisher  space  using  f )  -  W'F,,  and  3)  associating  f;  as  being 
from  the  device  with  the  maximum  conditional  likelihood  probability  according  to, 


A  :  arg  max  [  piDfltj)  ]  (3.13) 

i 

where  i=  1,2, . . .  ,  ND  and  p(D,\t'/)  is  the  conditional  likelihood  probability  that  fingerprint 
ij  belongs  to  device  D,.  Correct  classification  is  achieved  when  projected  “Testing” 
fingerprints  are  classified  to  be  from  their  true  device.  Average  percent  correct  (%C)  device 
classification  is  calculated  as  the  percentage  of  the  time  the  classifier  correctly  assigns  the 
fingerprint  to  its  true  device  over  all  trials. 

3.5.3  Device  ID  Verification. 

For  device  ID  verification  (a  1  vs.  1,  claimed  vs.  actual,  “Looks  how  much  like?” 
assessment),  the  process  used  here  is  consistent  with  the  methodology  used  in  [1 1,  12,  33]. 
The  focus  here  is  on  answering  “Does  the  device’s  current  RF-DNA  fingerprint  match  the 
stored  RF  fingerprint  template  associated  with  its  claimed  bit-level  identity?”.  RF-DNA 
fingerprints  can  be  used  to  authenticate  a  device’s  claimed  bit-level  identity,  i.e.,  a  device 
wants  to  access  a  network  and  has  presented  its  MAC  address,  SIM  number,  IMEI  number, 
etc.,  to  gain  access  [11].  Bit-level  credentials  can  be  easily  replicated  by  rogue  devices, 
and  RF-DNA  fingerprint  verification  provides  a  means  to  mitigate  unauthorized  access 
attempts.  This  is  done  by  a  1-to-l  comparison  of  current  vs.  claimed  RF  signatures,  with 
the  claimed  signature  being  a  stored  template  associated  with  the  claimed  bit- level  identity. 
Each  designated  authorized  device  in  a  network  will  have  a  stored  RF  signature  reference 
template  that  is  used  when  a  current  “Testing”  RF  fingerprint  is  received  and  has  claimed 
an  ID  of  a  authorized  device.  The  device  ID  verification  process  is  used  here  for  two 
performancwe  assessments,  including: 

1.  Authorized  Device  ID  Verification-.  Granting  network  access  to  authorized 
devices  presenting  true  bit-level  credentials. 
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2.  Rogue  Device  Rejection :  Denying  network  access  to  unauthorized  rogue 


devices  presenting  false  bit- level  credentials. 


3.5.3. 1  Authorized  Device  ID  Verification. 

Authorized  device  ID  verification  is  an  assessment  of  how  similar  a  device’s  current 
RF  fingerprint  matches  the  stored  reference  model  associated  with  the  claimed  identity, 
when  only  considering  “Testing  ”  RF  fingerprints  from  a  pool  of  ND  authorized  devices. 
The  similarity  measure,  or  verification  test  statistic  (zv)  reflects  “How  well”  the  current  and 
claimed  RF  fingerprint  identities  match  and  is  compared  with  a  threshold  (tv)  to  verify  the 
device’s  claimed  ID  and  grant  or  deny  network  access.  Verification  test  statistics  (zv)  can 
be  generated  from  probability-based  measures  or  geometric  measures  such  as  distance, 
spatial  angle,  etc.  The  specific  test  statistics  used  here  for  Device  ID  Verification  are 
inherently  provided  in  the  “posterior”  output  variable  of  MATLAB®  classify  function. 
The  posterior  matrix  contains  normalized  conditional  Multivariate  Gaussian  posterior 
probabilities  given  by, 


Nd  „  ’ 

X  p(Dk\tj) 

k=  1 


(3.14) 


where  i  -  1,2,...  ,  ND  and  f)  is  the  current  projected  RF  fingerprint  claiming  to  have  an  ID 
from  device,  D,.  For  this  research  it  is  assumed  that  each  authorized  device  claims  No  IDs 
(one  for  each  authorized  device).  For  a  given  “ Testing ”  RF  fingerprint  this  produces  Nd 
test  statistics,  where  one  test  statistic  is  from  the  proper  true  device  and  ND  - 1  test  statistics 
are  from  device’s  claiming  &  false  ID. 

Authorized  device  ID  verification  is  evaluated  one  claimed  ID  at  a  time,  where  test 
statistics  are  generated  for  all  ND  authorized  device’s  “ Testing ”  data  set  producing  two 
Probability  Mass  Functions  (PMFs):  1)  an  In-Class  PMF,  and  2)  an  Out-of-Class  PMF. 
Where  the  In-Class  PMF  is  formed  by  test  statistics  ( Zv )  from  a  device  that  is  actually 
who  it  claims  to  be,  the  current  RF  fingerprint  is  from  the  proper  authorized  device.  Each 
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authorized  device  will  have  a  corresponding  In-Class  PMF  and  these  are  known  as  the 
stored  true  reference  templates  associated  with  the  authorized  device’s  ID.  Out-of-Class 
PMF  is  generated  using  (zv)  for  the  case  when  a  authorized  device  falsely  claims  an  identity 
of  a  different  authorized  device.  Figure  3.10  shows  a  representative  In-Class  and  Out-of- 
Class  PMF  generated  from  arbitrary  test  statistics  (zv)  for  a  single  claimed  ID.  The  In- 
Class  probability  is  defined  as  p[zv\Cj,  Dfi,  where  i=j  and  C;  is  the  claimed  Device  ID 
(7=1,2, ...  ,  Nd)  and  Dj  is  the  actual  (current)  device.  The  corresponding  Out-of-Class 
probability  is  denoted  as  p[zv\C Dj\,  where  id-  j  and  j=  1, 2, . . .  ,  ND. 


pVZv  1C 


Figure  3.10:  Representative  In-Class  (unfilled)  and  Out-of-Class  (filled)  Probability  Mass 
Functions  (PMFs)  for  an  arbitrary  test  statistic  (zv)-  These  are  used  to  generate  an 
Authorized  Device  ID  Verification  ROC  curve  for  a  specific  claimed  ID  and  varying 
threshold  tv. 


Authorized  device  ID  verification  is  evaluated  for  all  claimed  IDs  and  is  assessed  using 
conventional  Receiver  Operating  Characteristics  (ROC)  curve  analysis  [15].  True  and  false 
device  ID  verification  rates  are  generated  by  varying  the  threshold  ( tv )  shown  in  Fig.  3.10 
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and  measuring  the  area  of  each  PMF.  True  Verification  Rate  (TVR)  is  a  measure  of  “how 
well”  current  RF  fingerprints  match  its  true  claimed  ID  and  is  the  area  under  the  In-Class 
PMF  when  zv<tv ■  The  corresponding  False  Verification  Rate  (FVR)  provides  a  measure 
of  “how  well”  current  RF  fingerprints  match  a  false  claimed  ID  and  is  the  area  under  the 
Out-of-Class  PMF  when  zv<tv ■  As  the  threshold  (tv)  varies,  corresponding  TVR  and  FVR 
are  used  to  generate  a  ROC  performance  curve.  As  shown  in  Fig.  3.11,  ROC  performance 
is  a  function  of  SNR.  Representative  thresholds  (l\<t2<h)  are  shown  to  emphasis  that  a 
given  verification  threshold  ty  dictates  TVR  and  FVR  performance. 


Figure  3.11:  Representative  Authorized  Device  ID  Verification  ROC  curves  showing 
performance  variation  as  a  function  of  SNR,  i.e.,  degradation  for  decreasing  SNR. 


3. 5. 3. 2  Rogue  Device  Rejection. 

Using  the  same  process  as  authorized  device  ID  verification,  Rogue  Device  Rejection 
capability  can  be  measured  when  a  rogue  device  presents  false  bit-level  credentials  in  an 
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attempt  to  gain  unauthorized  network  access.  Rogue  device  rejection  is  an  assessment  of 
how  similar  unauthorized  rogue  device’s  current  RF  fingerprint  matches  the  stored  true 
reference  template  associated  with  the  claimed  identity  presented  by  the  rogue  device. 
“Testing  ”  RF  fingerprints  are  generated  for  previously  unseen  NR  rogue  devices  using  the 
same  method  describe  in  this  chapter  and  projected  into  the  (Nd- 1)  Fisher  subspace.  The 
Zy  test  statistics  from  (3.14)  are  generated  to  provide  a  measure  of  ’’Flow  well”  the  rogue 
device’s  current  RF  fingerprint  matches  claimed  authorized  devices  RF  fingerprint.  For 
this  research  it  is  assumed  that  each  rogue  device  claims  Nd  IDs  (one  for  each  authorized 
device).  For  a  given  rogue  “ Testing  ”  RF  fingerprint  this  produces  ND  test  statistics,  where 
the  rogue  device  claimed  a  false  ID. 

Rogue  device  rejection  is  evaluated  one  claimed  ID  at  a  time,  where  test  statistics  are 
generated  for  a  single  NR  rogue  device’s  “Testing”  data  set  producing  a  new  Out-of-Class 
PMF,  that  is  compared  to  the  stored  true  reference  template  (In-Class  PMF)  associated  with 
the  rogue  device’s  claimed  ID.  For  a  single  claimed  ID,  Fig.  3.12  shows  a  representative 
unchanged  In-Class  PMF  from  Fig.  3.10  and  the  new  Out-of-Class  PMF  generated  from 
arbitrary  test  statistics  (zv)-  The  In-Class  probability  is  defined  as  p\z.v\Cn  D,\.  where  i=j 
and  Cj  is  the  claimed  Device  ID  0=1, 2, . . .  ,ND)  and  Dj  is  the  actual  (current)  device. 
The  corresponding  Out-of-Class  probability  is  denoted  as  p[zv\Cj,Dk],  where  k±j  and 
1,2,...  ,  Nd,  and  Dk  is  a  rogue  device. 

Rogue  device  rejection  is  assessed  using  conventional  ROC  curve  analysis  [15]. 
Varying  the  threshold  ( tv )  shown  in  Fig.  3.12  and  measuring  the  area  under  the  curve  for 
each  PMF  will  determine  the  True  Verification  Rate  (TVR)  and  Rogue  Accept  Rate  (RAR). 
TVR  is  a  measure  of  “how  well”  current  RF  fingerprints  match  its  true  claimed  ID  and  is 
the  area  under  the  In-Class  PMF  when  zv<tv •  The  area  under  the  In-Class-PMF  is  the  same 
as  shown  in  Fig.  3.10.  The  corresponding  RAR  provides  a  measure  of  “how  well”  current 
rogue  RF  fingerprints  match  a  falsely  claimed  authorized  device  ID  and  is  the  area  under 
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p[zr  \C,  ,  D, 


Figure  3.12:  Representative  In-Class  (unfilled)  PMF  from  Fig.  3.10  and  Out-of-Class 
(filled)  PMF  for  arbitrary  test  statistic  zv-  These  are  used  to  generate  an  Rogue  Device 
Rejection  ROC  curve  for  a  specific  claimed  ID  and  selected  threshold  tv. 


the  Out-of-Class  PMF  when  Zy<tv ■  The  RAR  is  a  measure  of  “how  often”  a  rogue  device  is 
granted  network  access  when  falsely  claiming  a  bit-level  identity  of  a  authorized  network 
device.  Rogue  Reject  Rate  (RRR)  is  defined  as  RRR-\  -  RAR ;  a  higher  RAR  (lower  RRR) 
reflects  poorer  security  performance.  Figure  3.13  shows  representative  authorized  device 
ID  verification  and  rogue  device  rejection  ROC  performance  curves,  illustrating  the  process 
of  setting  a  threshold  (tv)  to  achieve  a  desired  TVR  corresponds  to  a  given  authorized  device 
false  verification  rate  and  a  rogue  accept  rate  for  a  specific  claimed  ID. 
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Figure  3.13:  Representative  Authorized  Device  ID  Verification  and  corresponding  Rogue 
Device  Rejection  ROC  curves.  Verification  threshold  tv  is  set  to  achieve  desired  authorized 
device  TVR  and  FVR  which  maps  directly  to  a  corresponding  rogue  device  RAR  (RRR) 
for  a  specific  claimed  ID. 
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IV.  Results  and  Analysis 


This  chapter  provides  results  for  ZigBee  device  discrimination,  to  include  Device 
Classification  and  Device  ID  Verification  using  full-dimensional  and  reduced 
dimensional  RF-DNA  feature  sets.  The  reduced  dimensional  subsets  are  obtained  through 
Dimensional  Reduction  Analysis  (DRA)  as  described  in  Sect.  3.4  using  a  qualitative 
phase-only  feature  selection  process  as  in  [11,  31]  and  two  quantitative  selection 
methods,  including:  1)  pre-classification  Kolmogorov-Smimov  (KS)-Test  p-value  ranking 
and  2)  post-classification  Generalized  Relevance  Learning  Vector  Quantization-Improved 
(GRLVQI)  feature  relevance  ranking.  Device  Classification  and  Device  ID  Verification 
are  performed  using  the  methodology  discussed  in  Section  3.5.  Section  4.1  provides  the 
details  how  Multiple  Discriminant  Analysis  (MDA)  training  was  accomplished,  including 
the  selection  of  Training  and  Testing  data  sets.  Section  4.2  provides  baseline  Multiple 
Discriminate  Analysis,  Maximum  Likelihood  (MDA/ML)  classification  performance  using 
full-dimensional  RF-DNA  fingerprints.  Section  4.3  provides  comparative  DRA  feature 
selection  results  for  the  three  selection  methods  considered.  Section  4.4  provides 
Device  Classification  results  using  selected  DRA  feature  sets,  and  Section  4.5  provides 
verification  results,  including  Authorized  Device  ID  Verification  and  Rogue  Device 
Rejection  performance  using  DRA  reduced  feature  sets. 

4.1  MDA  Training  and  Model  Development 

MDA  training  was  accomplished  using  NShr= 500  independent  ZigBee  Synchronization 
Header  (SHR)  responses  collected  from  each  location  (“CAGE”,  “LOS”,  and  “WALL”)  for 
each  device  used  for  hybrid  model  development  (Devi,  Dev2,  Dev3,  Dev4).  In  addition, 
NNz= 5  independent,  like-filtered,  Monte  Carlo  Noise  realizations  were  added  to  the  SHR 
responses  for  each  analysis  SNR  considered.  Thus,  for  Nd= 4  devices  MDA  training,  K- 
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fold  generation  of  the  “best”  MDA  model  (Wf,  juh  t,P)  and  MVG  statistics  of  projected 
Training  fingerprints,  are  on  a  total  of  NTng=( 500  SHR)x(3  Locations)x(5  NNz)= 7500  in¬ 
dependent  Training  realizations  per  device.  Results  for  classification  are  likewise  based  on 
NSHr= 500  Testing  fingerprints  per  location  for  each  device  and  NNz= 5  noise  realizations 
per  SNR,  resulting  in  Ntst=2500  Testing  realizations.  This  large  number  of  trials  reduced 
the  CI=95%  Confidence  Interval  (Cl)  bars  to  within  the  vertical  extent  of  the  plotted  data 
markers.  Therefore,  the  0=95%  are  intentionally  omitted  in  all  plots  to  enhance  visual 
clarity  and  qualitative  assessment. 

4.2  Device  Classification:  Full-Dimensional  Performance 

Full-Dimensional  RF-DNA  feature  sets  are  based  on  Nc= 3  signal  characteristics  ( a , 
(p,  and  /),  Nm= 3  statistics  (cr2,  y,  and  k),  and  NR  +  1=81  total  regions.  Thus,  the  composite 
fingerprint  F  for  each  collected  emission  is  comprised  of  NF=129  RF  fingerprint  features  as 
given  by  (2.14).  Figure  4.1  shows  the  full-dimensional  classification  Testing  performance 
for  the  hybrid  location  (responses  from  “CAGE”,  “LOS”,  and  “WALL”)  scenario  and 
SNRe[ 0  24]  dB.  An  arbitrary  performance  benchmark  of  %C=90%  (average  across 
devices)  is  achieved  at  SNR-9.2  dB(«10.0  dB),  with  all  devices  achieving  %C=80%  or 
better  classification  at  this  point.  Each  device  classification  performance  curve  shown  in 
Lig.  4.1  is  an  average  performance  across  locations  (“CAGE”,  “LOS”,  and  “WALL”). 

4.3  Device  Classification:  DRA  Feature  Selection 

Results  in  Fig.  4. 1  show  that  the  arbitrary  %C=90%  benchmark  can  be  achieved  for  all 
devices  at  various  SNR  using  a  full-dimensional  A/- =729  feature  set,  with  average  cross¬ 
device  %C=90%  achieved  at  SNR~10.0  dB.  Feature  down-selection  was  next  performed 
using  DRA  to  determine  the  minimum  number  of  features  required  to  maintain  average 
cross-device  %C=90%.  Feature  relevance  was  determined  using  RF  fingerprints  extracted 
from  emissions  at  S NR=  10.0  dB  (the  SNR  at  which  %C=90%  in  Fig.  4.1).  Quantitative 
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Figure  4.1:  MDA/ML  Device  Classification  performance  using  a  full-dimensional 
(Np=l 29)  ZigBee  feature  set  at  indicated  SNR.  The  cross-device  average  is  shown  and 
used  for  subsequent  comparison  with  DRA  performance  results. 


DRA  was  performed  using  the  Np= 729  full-dimensional  feature  with  1)  pre-classification 
KS-Test  p- value  ranking  and  2)  post-classification  GRLVQI  At  feature  relevance  ranking. 

Quantitative  DRA  enables  identification  and  selection  of  feature  subsets,  where  the 
most  relevant  features  are  selected  from  the  full-dimensional  feature  set.  Figure  4.2  shows 
the  NF =729  full-dimensional  ZigBee  feature  number  indices  and  corresponding  relevance 
indicators  for  SNR= 10.0  dB  using  1)  pre-classification  KS-Test  p-values  and  2)  post¬ 
classification  GRLVQI  A,  relevance  values.  Most  significant  feature  relevance  is  indicated 
by  a  lower  summed  p- value  from  the  KS-Test  and  a  higher  T,  from  the  GRLVQI  process. 
The  DRA  process  simply  involves  sorting  Fig.  4.2  results  to  establish  a  rank-ordering  that 
can  be  used  to  select  a  desired  number  of  most  relevant  features. 
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4.4  Device  Classification:  DRA  Performance 


Previous  research  [11,  31]  has  qualitatively  shown  that  ZigBee  phase-derived  features 
possess  greater  discriminating  information  than  either  amplitude-derived  or  frequency- 
derived  features  when  used  with  an  MDA/ML  classifer.  As  detailed  in  Section  3.3,  the 
full-dimensional  ZigBee  feature  set  consists  of  Af=729  total  features,  including  A^=243 
amplitude,  phase,  and  frequency  features.  Figure  4.3  displays  DRA  subsets  comprised 
of  A^=243  selected  features  and  their  corresponding  indices  for  1)  qualitative  phase-only 
feature  selection,  2)  quantitative  KS-Test  top-ranked  feature  selection,  and  3)  quantitative 
GRLVQI  top-ranked  feature  selection. 

Figure  4.4  shows  average  Device  Classification  performance  using  the  NF=129  full¬ 
dimensional  feature  set  and  the  DRA«66%  subsets  (7V^.-=243  features  retained)  shown  in 
Fig.  4.3.  Relative  to  full-dimensional  performance,  the  DRA«66%  subsets  yield  relatively 
consistent  classification  performance  and  exhibit  a  “gain”  of  G«-1.0  dB  at  the  %C=90% 
benchmark;  the  “gain”  metric  is  introduced  here  for  comparative  assessment  and  defined 
as  the  difference,  expressed  in  dB,  in  required  SNR  (dB)  for  two  systems,  methods,  etc.,  to 
achieve  a  specified  performance  %C. 

Further  reduction  of  RF-DNA  fingerprint  dimensionality  is  obtained  using  the  top- 
ranked  A^-200,  100,  50,  and  25  features  that  were  quantitatively  selected  using  the  1)  pre¬ 
classification  KS-Test  and  2)  post-classification  GRLVQI  relevance  rankings.  Figure  4.5 
displays  the  top-ranked  iV^=243,  200,  100,  50,  and  25  features  from  both  quantitative  DRA 
methods  and  their  corresponding  index  number  within  the  full-dimensional  feature  set. 
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(a)  KS-Test:  Lower  — >  Greater  Relevance 


(b)  GRLVQI:  Higher  — >  Greater  Relevance 

Figure  4.2:  Unsorted  DRA  feature  relevance  indicators:  (a)  KS-Test  /^-values  and 
(b)  GRLVQI  Ai  relevance  values.  Results  shown  here  for  S NR=  10.0  dB  which  corresponds 
to  a  cross-device  %C«90%  in  Fig.  4.1. 
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Figure  4.3:  DRA  Selected  A^=243  subsets  of  full-dimensional  (NF= 729)  feature  set. 
Selection  based  on  1)  qualitative  phase-only,  2)  quantitative  top-ranked  KS-Test,  and 
3)  quantitative  top-ranked  GRLVQI  feature  selection  methods. 


Figure  4.4:  Average  MDA/ML  device  classification  performance  using  DRA  selected 
Np= 243  feature  subsets  shown  in  Fig.  4.3.  Full-dimensional  AV=729  performance  from 
Fig.  4.1  provided  for  comparison. 
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(a)  Pre-Classification  KS-Test. 


(b)  Post-Classification  GRLVQI. 

Figure  4.5:  Illustration  of  top-ranked  Np= 243,  200,  100,  50,  and  25  DRA  feature  subsets 
using  (a)  pre-classification  KS-Test  and  (b)  post-classification  GRLVQI  rankings. 
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The  effect  of  additional  feature  reduction  and  assessment  of  hybrid  location  classi¬ 
fication  performance  is  shown  in  Fig.  4.6  using  DRA  subsets  containing  the  top-ranked 
Np= 243,  200,  100,  50  and  25  features  that  were  quantitatively  selected  using  1)  pre¬ 
classification  KS-Test  and  2)  post-classification  GRLVQI  relevance  rankings.  Considering 
the  previously  established  %C=90%  benchmark  for  assessing  DRA  classification  perfor¬ 
mance,  results  in  Fig.  4.6  show  that: 

1.  The  required  SNR  for  KS-Test  top-ranked  Ay=243  and  Ay=50  feature  sets 
approximately  spans  S NRe[  10  18]  dB,  with  the  top-ranked  Ay =25  feature  set  never 
achieving  the  %C=90%  benchmark.  This  is  an  indication  that  the  MDA  model 
development  process  is  unable  to  achieve  adequate  inter-class  separation  and/or 
sufficient  intra-class  spread  minimization  using  only  Ay =25  features. 

2.  The  required  SNR  for  GRLVQI  top-ranked  Ay =243  and  Ay =50  feature  sets 
approximately  spans  SNRe[\0  18]  dB  which  is  consistent  with  KS-Test  feature 
selection  performance.  However,  the  GRLVQI  top-ranked  Ay =25  feature  set  also 
achieves  the  %C=90%  benchmark  at  S NR~30  dB. 

The  KS-Test  and  GRLVQI  feature  selection  performances  in  Fig.  4.6  are  summarized 
in  Table  4.1  which  shows  the  “Gain”  for  each  DRA  case  relative  to  performance  using  the 
DRA«66%  reduced  Ay-=243  feature  set. 
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(a)  KS-Test  Feature  Selection. 


(b)  GRLVQI  Feature  Selection. 

Figure  4.6:  MDA/ML  Device  Classification  performance  using  DRA  subsets  from  Fig.  4.5 
selected  by  (a)  KS-Test  p- values  and  (b)  GRLVQI  A,  relevance  values.  Average  A^=243 
DRA  performance  from  Fig.  4.4  provided  for  comparison. 
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Table  4.1:  MDA/ML  Device  Classification  performance  “Gain”  (dB)  for  DRA  subsets  in 
Fig.  4.6  relative  to  performance  using  the  DRA  A^-243  feature  subset. 


DRA  Method 

Number  of  DRA  Features  (Nfi) 

243 

200 

100 

50 

25 

KS-Test 

0.0  dB 

-0.2  dB 

-2.6  dB 

-6.1  dB 

N/A 

GRLVQI 

0.0  dB 

-0.3  dB 

-1.9  dB 

-6.75  dB 

-17.8  dB 

4.5  Device  ID  Verification 

Verification  of  a  device’s  claimed  bit-level  ID  provides  a  means  for  granting  authorized 
devices  network  access  while  denying  access  to  unauthorized  devices.  It  is  assumed  here 
that  a  device  wanting  to  gain  network  access  provides  a  claimed  bit-level  ID  and  that  RF- 
DNA  features  can  be  used  to  authenticate  the  claimed  ID.  The  Device  ID  Verification 
process  performs  a  1-to-l  comparison  between  a  device’s  current  RF-DNA  fingerprint 
and  a  stored  reference  fingerprint  for  the  claimed  bit-level  ID.  Device  ID  verification  is 
accomplished  here  using  the  methodology  described  in  Section  3.5.3  and  emissions  from 
10  ZigBee  devices,  including:  1)  the  same  ND=4  authorized  devices  used  previously  for 
device  classification  assessment  (Devi,  Dev2,  Dev3,  and  Dev4),  and  2)  an  additional 
Nr= 6  unauthorized  “rogue”  devices  (Dev5,  Dev6,  Dev7,  Dev8,  Dev9  and  DevlO).  The 
verification  process  is  used  to  assess  both  Authorized  Device  ID  Verification  performance 
using  the  ND= 4  authorized  devices,  and  Rogue  Device  Rejection  performance  using  the 
Nr= 6  rogue  devices.  Of  particular  importance  is  that  the  hybrid  MDA  model  developed  in 
Sect.  4.1  for  Device  Classification  is  also  used  here  for  verification  assessment. 

4.5.1  Authorized  Device  ID  Verification. 

Authorized  device  ID  verification  is  performed  using  the  same  independent  7  =7500 
projected  Testing  fingerprints  from  classification  for  each  of  the  Np=4  authorized  devices. 
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Verification  performance  is  evaluated  at  SNR= 18.0  dB  using  Nr=50  DRA  reduced  fea¬ 
ture  sets  selected  by  rank  ordering  1)  pre-classification  KS-Test  p- values  and  2)  post¬ 
classification  GRLVQI  At  relevance  values. 

For  ROC  curve  generation  and  analysis,  each  of  the  ND  authorized  devices  presents 
a  true  claimed  ID  for  itself,  as  well  as,  a  false  claimed  ID  for  the  other  authorized 
devices  (e.g.,  Devi  presents  a  claimed  ID  for  Devi,  Dev2,  Dev3,  and  Dev4).  For  a 
specific  claimed  bit-level  ID,  iVrsr=7500  projected  Testing  fingerprints  from  each  of  the 
Nd  authorized  devices  are  used  to  generate  ( NTS  j =75 00 ) x (V0 = 4 ) = 3 0000  normalized 
Multivariate  Gaussian  posterior  probability  test  statistics  according  to  (3.14).  The 
collection  of  test  statistics  are  used  to  create  the  In-Class  and  Out-of-Class  Probability  Mass 
Functions  (PMFs)  described  in  Section  3.5.3  for  the  specific  claimed  ID.  For  example,  the 
In-Class  PMF  is  constructed  from  7500  test  statistics  where  the  current  RF-DNA  fingerprint 
is  indeed  from  the  true  claimed  device  ID;  this  same  In-Class  PMF  is  subsequently  used 
for  Rogue  Device  Rejection  assessment  in  Sect.  4.5.2.  The  associated  Out-of-class  PMF 
is  constructed  from  22500  test  statistics  where  the  current  RF-DNA  fingerprint  is  from  a 
falsely  claimed  device  ID.  Representative  PMFs  are  presented  in  Fig.  4.7  for  one  specific 
case  where  all  ND= 4  authorized  devices  present  claimed  bit-level  IDs  for  Dev2.  The 
resultant  In-Class  and  Out-of-Class  PMFs  are  used  to  produce  one  Authorized  Device  ID 
Verification  Receiver  Operating  Characteristics  (ROC)  curve. 

Figure  4.8  shows  Authorized  Device  ID  Verification  performance  for  each  of  the  ND=4 
authorized  ZigBee  devices  for  a  DRA  reduced  feature  set  of  Nr=50  features  selected  using 
1)  pre-classification  KS-Test  values,  and  2)  post-classification  GRLVQI  relevance  rankings. 
The  verification  ROC  curves  were  generated  at  SNR= 18  dB  which  corresponds  to  the 
%C= 90%  benchmark  in  Fig.  4.6  using  the  same  feature  set.  The  ND=4  ROC  curves  show 
that  there  exists  a  device-dependent  verification  threshold  A(m)  such  that  all  authorized 
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(a)  In-Class  PMF:  Device  2,  7,500  Testing  RF-DNA  fingerprints. 
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(b)  Out-of-Class  PMF:  Devices  (1,3,4),  22,500  Testing  RF-DNA  fingerprints. 


Figure  4.7:  In-Class  and  Out-of-Class  PMFs  for  Claimed  ID  =  Device  2.  Generated  from 
test  statistic  zv  in  (3.14)  for  KS-Test  top-ranked  Nf=50  features  at  S NR=  1 8  dB. 


device  IDs  can  be  verified  at  True  Verification  Rate  (TVR> 90%)  and  False  Verification 
Rate  (FVR<10%)  for  both  methods  considered. 
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(a)  KS-Test  Feature  Selection. 


(b)  GRLVQI  Feature  Selection. 


Figure  4.8:  ZigBee  Authorized  Device  ID  Verification  for  No- 4  authorized  devices 
operating  at  SNR=\8.0  dB  (%C~90%  in  Fig.  4.6)  using  top-ranked  Nf= 50  features  from 
(a)  pre-classification  KS-Test  and  (b)  post-classification  GRLVQI  selection  methods. 
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4.5.2  Rogue  Device  Rejection. 

The  ability  to  use  RF-DNA  to  reject  unauthorized  rogue  devices  presenting  false  bit- 
level  identities  is  demonstrated  using  the  same  ID  verification  process  used  for  authorized 
devices.  Rogue  Device  Rejection  is  an  assessment  of  “how  well”  current  RF-DNA 
fingerprints  from  a  pool  of  rogue  (previously  unseen  and  unauthorized)  devices  match 
RF-DNA  fingerprints  associated  with  the  claimed  ID  of  an  authorized  device.  This  is 
demonstrated  here  using  NR=6  (Dev5,  Dev6,  Dev7,  Dev8,  Dev9,  DevlO)  unauthorized 
rogue  devices  whose  emissions  were  collected  under  various  conditions  (“CAGE”,  “LOS”, 
and  “WALL”).  A  total  of  7- =(  1 000  S HR)x(\  Location)x(5  NNz)= 5000  previously 

unseen  RL-DNA  fingerprint  realizations  were  used  for  each  of  the  NR  devices.  Table  4.2 
lists  the  9  ZigBee  device  ID  and  collection  condition  combinations  that  were  considered 
using  the  NR= 6  rogue  devices.  Lor  each  of  the  9  different  combinations,  the  rogue  device 
presented  a  claimed  ID  for  each  of  the  Nn=4  authorized  device,  producing  a  total  of  36 
Rogue  Device  Rejection  scenarios. 


Table  4.2:  Nine  ZigBee  Device  ID  and  collection  condition  combinations  used  for  Assess¬ 
ing  Rogue  Device  Rejection  capability.  Grey  cells  correspond  untested  combinations. 
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For  a  specific  claimed  bit-level  ID,  NTST=5000  projected  Testing  fingerprints  from  a 
rogue  device  are  used  to  generate  5000  test  statistics  using  (3.14).  The  collection  of  test 
statistics  are  used  to  construct  the  Out-of-Class  PMF  which  is  used  with  the  corresponding 
claimed  ID  In-Class  PMF  generated  as  part  of  the  Authorized  Device  ID  Verification 
process  in  Sect.  4.5.1.  The  resultant  PMFs  are  used  to  produce  one  ROC  performance 
curve.  As  detailed  in  the  following  two  subsections,  Rogue  Device  Rejection  capability 
was  assessed  using  each  of  the  DRA  feature  selection  methods. 

4.5.2. 1  KS-Test  Selected  Features. 

Results  for  Rogue  Device  Rejection  assessment  using  the  KS-Test  DRA  selected 
features  are  presented  in  Fig.  4.9.  These  results  include  the  36  rogue  scenarios  using  top- 
ranked  Nf= 25,  50,  100  feature  sets  at  SNR=  18.0  dB.  These  are  conventional  ROC  curves 
presented  as  True  Verification  Rate  (TVR)  versus  Rogue  Accept  Rate  (RAR),  where  Rogue 
Reject  Rate  is  defined  as  RRR=\-RAR',  a  higher  RAR  (lower  RRR)  reflects  greater  rogue 
access  and  poorer  network  security  performance.  Authorized  Device  ID  Verification  ROC 
curves  are  provided  alongside  the  rogue  device  ID  ROC  curves  to  enable  identification  of 
the  fixed  threshold  that  achieves  authorized  device  TVR> 90%  and  direct  mapping  to  the 
corresponding  RAR  (RRR)  for  each  rogue  scenario.  The  solid  black  curves  in  Fig.  4.9  (b), 
(d),  and  (f)  correspond  to  rogue  scenarios  that  achieve  an  arbitrary  RAR<  10%  (RRR> 90%) 
performance  benchmark  when  the  threshold  is  fixed  such  that  TVR> 90%.  As  indicated, 
performance  using  Nf= 25,  50,  100  KS-Test  feature  sets  achieved  the  arbitrary  RRR> 90% 
benchmark  in  21,  29,  and  30  out  of  the  36  rogue  scenarios,  respectively.  Table  4.3 
through  Table  4.5  highlight  rogue  scenarios  which  fail  to  achieve  the  arbitrary  RRR> 90% 
performance  benchmark  using  selected  DRA  feature  subsets. 
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Figure  4.9:  Performance  using  KS-Test  selected  features  (Nf= 25,  50,  100)  for  ND= 4 
authorized  devices  and  Nr=6  unauthorized  rogue  devices  in  various  operating  scenarios 
falsely  claiming  each  of  the  Nd=4  authorized  device  IDs  (36  total  rogue  scenarios).  Grey 
ROC  curves  correspond  to  rogue  scenarios  where  RAR<  10%  (RRR> 90%)  is  not  achieved. 
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Table  4.3:  ZigBee  device  ID  and  collection  condition  combinations  from  Table  4.2  where 
Rogue  Device  Rejection  performance  in  Fig.  4.9  fails  to  meet  RAR<  10%  (RRR>9()%)  with 
Np=25  features  selected  using  KS-Test  DRA  at  S NR= 18  dB.  The  numbers  correspond  to 
the  Rogue  device  claimed  ID  and  indicate  failure  for  15  of  36  rogue  scenarios. 


ZigBee  ID 

CAGE 

LOS 

WALL 

Dev5 

Dev6 

Dev7 

Dev  8 

1 

Dev9 

1 

Dev  10 

Table  4.4:  ZigBee  device  ID  and  collection  condition  combinations  from  Table  4.2  where 
Rogue  Device  Rejection  performance  in  Fig.  4.9  fails  to  meet  RAR<  10%  (RRR> 90%)  with 
Np=50  features  selected  using  KS-Test  DRA  at  S NR= 18  dB.  The  numbers  correspond  to 
the  Rogue  device  claimed  ID  and  indicate  failure  for  7  of  36  rogue  scenarios. 
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Table  4.5:  ZigBee  device  ID  and  collection  condition  combinations  from  Table  4.2  where 
Rogue  Device  Rejection  performance  in  Fig.  4.9  fails  to  meet  RAR<  10%  (RRR>9()%)  with 
Np  =  1 00  features  selected  using  KS-Test  DRA  at  S N R=  1 8  dB.  The  numbers  correspond  to 
the  Rogue  device  claimed  ID  and  indicate  failure  for  6  of  36  rogue  scenarios. 
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4.5. 2. 2  GRLVQI  Selected  Features. 

Results  for  Rogue  Device  Rejection  assessment  using  the  GRLVQI  DRA  selected 
features  are  presented  in  Fig.  4.10.  These  results  include  the  36  rogue  scenarios  using  top- 
ranked  Nf= 25,  50,  100  feature  sets  at  SNR=  18.0  dB.  As  with  KS-Test  results  presented  in 
Sect.  4.5.2. 1,  an  arbitrary  RRR> 90%  benchmark  is  used  for  comparative  assessment  at  an 
Authorized  Device  ID  Verification  operating  point  of  TVR> 90%.  The  solid  black  curves  in 
Fig.  4. 10  (b),  (d),  and  (f)  correspond  to  rogue  scenarios  that  achieve  the  arbitrary  RRR> 90% 
benchmark  for  a  fixed  threshold  yielding  TVR>90%.  As  indicated,  performance  using 
Np=25,  50,  100  GRLVQI  feature  sets  achieved  the  arbitrary  RRR> 90%  benchmark  in  23, 
28,  and  30  out  of  the  36  rogue  scenarios,  respectively.  Table  4.6  through  Table  4.8  highlight 
rogue  scenarios  which  fail  to  achieve  the  arbitrary  RRR> 90%  performance  benchmark 
using  selected  DRA  feature  subsets. 
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(c)  Authorized  ID  Verification :  N-p-50.  (d)  Rogue  Device  Rejection-.  Nf =50. 


Figure  4.10:  Performance  using  GRLVQI  selected  features  (Np=25,  50,  100)  for  ND= 4 
authorized  devices  and  NR=6  unauthorized  rogue  devices  in  various  operating  scenarios 
falsely  claiming  each  of  the  ND=4  authorized  device  IDs  (36  total  rogue  scenarios).  Grey 
ROC  curves  correspond  to  rogue  scenarios  where  RAR<10%  (RRR> 90%)  is  not  achieved. 
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Table  4.6:  ZigBee  device  ID  and  collection  condition  combinations  from  Table  4.2  where 
Rogue  Device  Rejection  performance  in  Fig.  4.9  fails  to  meet  RAR<  10%  (RRR> 90%)  using 
Np=25  features  selected  using  GRLVQI  DRA  at  S  NR=  18  dB.  The  numbers  correspond  to 
the  Rogue  device  claimed  ID  and  indicate  failure  for  13  of  36  rogue  scenarios. 


Table  4.7:  ZigBee  device  ID  and  collection  condition  combinations  from  Table  4.2  where 
Rogue  Device  Rejection  performance  in  Fig.  4.9  fails  to  meet  RAR<  10%  (RRR> 90%)  with 
Np=50  features  selected  using  GRLVQI  DRA  at  S NR=  18  dB.  The  numbers  correspond  to 
the  Rogue  device  claimed  ID  and  indicate  failure  for  8  of  36  rogue  scenarios. 
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Table  4.8:  ZigBee  device  ID  and  collection  condition  combinations  from  Table  4.2  where 
Rogue  Device  Rejection  performance  in  Fig.  4.9  fails  to  meet  RAR<  10%  (RRR>9()%)  with 
Nr=  1 00  features  selected  using  GRLVQI  DRA  at  S  NR= 18  dB.  The  numbers  correspond 
to  the  Rogue  device  claimed  ID  and  indicate  failure  for  6  of  36  rogue  scenarios. 
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V.  Summary  and  Conclusions 


This  chapter  provides  a  summary  of  research  activities,  research  contributions,  and 
recommendations  for  further  research. 

5.1  Summary 

This  research  was  conducted  to  expand  AFIT’s  RF  “Distinct  Native  Attribute”  DNA 
(RF-DNA)  fingerprinting  process  to  support  IEEE  802.15.4  ZigBee  communication  system 
applications.  ZigBee -based  wireless  networks  are  energy  efficiency,  low  complexity, 
low  cost,  and  widely  deployed  in  many  applications,  including  energy  management  and 
efficiency,  home,  building,  and  industrial  control  automation,  and  home  area  networks  to 
name  a  few  [14,  45,  46].  As  ZigBee  networks  continue  to  increase  in  popularity,  higher 
levels  of  security  become  essential  and  are  critical  to  protect  sensitive  personal  information 
and  physical  system  access.  The  particular  security  concern  addressed  under  this  research 
is  the  exploitation  of  bit-level  device  identities  (ID)  to  gain  unauthorized  network  access. 

To  counter  bit-level  “spoofing”  attacks,  RF-DNA  fingerprints  are  extracted  from 
Physical  (PHY)  waveform  features  and  used  to  achieve  human-like  discrimination  of 
ZigBee  network  devices  in  a  typical  operational  environment.  By  designating  certain 
devices  as  authorized  and  others  as  unauthorized ,  ZigBee  network  vulnerability  to 
outsider  threats  is  assessed  using  Receiver  Operating  Characteristic  (ROC)  curves  to 
characterize  both  Authorized  Device  ID  Verification  performance  (granting  network  access 
to  authorized  users  presenting  true  bit-level  credentials)  and  Rogue  Device  Rejection 
performance  (denying  network  access  to  unauthorized  rogue  devices  presenting  false  bit- 
level  credentials). 

For  demonstrations  here,  emissions  were  collected  from  TI  CC2420  ZigBee  devices 
operating  under  three  environmental  scenarios:  1)  “CAGE”-devices  and  collection  receiver 
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antenna  both  in  an  anechoic  chamber,  2)  “LOS”-devices  within  Line-of-Sight  of  the 
collection  receiver  antenna,  and  3)  “WALL”-devices  placed  behind  a  wall  relative  to  the 
collection  receiver  antenna.  For  each  device,  RF-DNA  fingerprint  features  were  extracted 
from  a  “hybrid”  pool  of  emissions  containing  emissions  from  each  of  the  operational 
environments.  The  hybrid  features  were  used  for  Multiple  Discriminant  Analysis  (MDA) 
model  development  and  Maximum  Likelihood  (ML)  Device  Classification  performed  using 
both  full-dimensional  and  Dimensional  Reduction  Analysis  (DRA)  reduced  dimensional 
RF-DNA  fingerprints.  The  DRA  reduced  sets  were  selected  using  a  1)  pre-classification 
Kolmogorov-Smirnov  (KS)-test  process  and  2)  post-classification  Generalized  Relevance 
Learning  Vector  Quantization-Improved  (GRLVQI)  feature  relevance  ranking  process.  The 
same  hybrid  MDA/ML  model  was  used  in  a  verification  process  for  assessing  Authorized 
Device  ID  Verification  and  Rogue  Device  Rejection.  In  both  cases,  devices  attempt  to  gain 
network  access  by  providing  bit-level  ID  credentials  (ZigBee  MAC  address);  authorized 
devices  present  true  bit- level  IDs  while  rogue  devices  present  false  bit-level  IDs  matching 
authorized  device  IDs.  The  1  vs.  1  verification  process  extracts  RF-DNA  fingerprints 
from  a  current  device  emission  and  compares  it  with  stored  RF-DNA  fingerprint  for  the 
claimed  ID.  Network  access  is  granted  (rightly  or  wrongly)  based  on  a  measure  of  similarity 
(test  statistic)  that  provides  a  “Looks  how  much  like?”  assessment  of  the  two  RF-DNA 
fingerprints. 

5.2  Conclusions 

Using  device  RF-DNA  features  remains  a  viable  alternative  for  augmenting  bit-level 
security  protocols.  This  is  supported  by  results  here  which  show  that  RF-DNA  from  IEEE 
802.15.4  Zigbee  emissions  can  be  used  as  inputs  to  an  MDA/ML  discrimination  process 
to  perform  reliable  1  vs.  No  “Looks  most  like?”  classification  assessment,  as  well  as 
1  vs.  1  “Looks  how  much  like?”  verification  assessment.  Performance  was  first  assessed 
with  an  MDA/ML  model  developed  using  features  from  a  “hybrid”  pool  of  emissions  from 
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Nd= 4  devices  and  full-dimensional  RF-DNA  fingerprints  comprised  of  Vy=729  features. 
Device  Classification  performance  achieved  an  arbitrary  benchmark  of  average  correct 
classification  %C> 90%  (across  all  devices)  for  SNR>  10.0  dB,  with  individual  devices 
achieving  %C> 80%  at  this  same  SNR. 

The  full-dimensional  Nr— 129  feature  set  was  reduced  using  DRA  and  resultant  clas¬ 
sification  and  verification  performance  assessed.  The  top-ranked  Nf= 243  ZigBee  feature 
subset  was  qualitatively  selected  according  to  related  work  in  [31],  and  quantitatively  se¬ 
lected  using  two  methods,  including:  1)  pre-classification  KS-Test  p-value  ranking  [12,  3 1], 
and  2)  post-classification  GRLVQI  /I,  relevance  ranking  [12,  33,  36].  Hybrid  MDA/ML  De¬ 
vice  Classification  performance  using  these  DRA«66%  reduced  subsets  was  marginally 
poorer  than  full  dimensional  performance  and  reflected  a  “gain”  of  G~-1.0  dB  at  the 
%C=90%  benchmark;  gain  is  defined  herein  as  the  reduction  in  required  SNR,  expressed 
in  dB,  for  two  systems,  methods,  etc.,  to  achieve  a  given  %C  performance.  Thus,  the  im¬ 
plementation  trade-off  is  a  66%  reduction  in  the  number  of  features  (computational  com¬ 
plexity,  storage,  etc.,  reduction)  at  the  expense  of  requiring  an  additional  SNR-  1.0  dB 
improvement  in  channel  conditions. 

Additional  quantitative  KS-Test  and  GRLVQI  DRA  feature  selection  was  performed 
and  classification  performance  assessed  using  the  top-ranked  A^-200.  100,  50,  and  25 
features.  Relative  to  the  %C>90%  benchmark  [12]: 

1 .  The  KS-Test  selected  feature  sets  required  S NR-  10.0  dB  (7V^=243)  to  S NR-  1 8.0  dB 
(Nj:=50),  with  results  for  Nf=25  failing  to  meet  the  benchmark. 

2.  The  GRLVQI  selected  feature  sets  required  the  same  SNR-  10.0  dB  (Nf= 243) 
to  SNR-  18.0  dB  (Np= 50),  with  results  for  Nf-7=25  achieving  the  benchmark  at 
SNR- 30.0  dB. 

Hybrid  MDA/ML  verification  performance  was  assessed  for  1)  No— 4  authorized 
network  devices  and  2)  Nr=6  unauthorized  (rogue)  network  devices.  Performance  was 
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evaluated  using  the  Nf=50  DRA  feature  set  at  SNR=  18.0  dB  given  that  the  %C=90% 
benchmark  was  achieved  under  these  conditions.  ROC  curve  analysis  for  Authorized 
Device  ID  Verification  indicated  that  there  exists  a  device  dependent  threshold  tv(m)  for  all 
authorized  devices  such  that  a  True  Verification  Rate  of  TVR>90%  and  False  Verification 
Rate  of  FVR<  10%  are  realized  for  both  DRA  methods;  this  range  of  T  V R  and  FVR  was 
arbitrarily  selected  for  comparative  assessment. 

Rogue  Device  Rejection  capability  was  assessed  using  NR= 6  unauthorized  devices 
placed  in  nine  collection  combinations  of  various  experimental  “CAGE”,  “LOS”,  and 
“WALL”  locations,  with  each  rogue  device  falsely  presenting  a  claimed  ID  matching 
each  of  the  Nd= 4  authorized  IDs;  a  total  of  36  rogue  assessment  scenarios.  Considering 
an  arbitrary  Rogue  Rejection  Rate  of  RRR> 90%,  ROC  curve  analysis  for  Rogue  Device 
Rejection  indicated  that  performance  using  KS-Test  and  GRLVQI  selected  feature  sets  was 
consistent.  Specific  performance  included  [12]: 

1.  The  KS-test  selected  feature  sets  achieving  RRR> 90%  in  21,  29,  and  30  of  the  36 
rogue  scenarios  using  A~=100,  50,  and  25  top-ranked  features,  respectively. 

2.  The  GRLVQI  selected  feature  sets  achieving  RRR> 90%  in  23,  28,  and  30  of  the  36 
rogue  scenarios  using  N-p=  1 00.  50,  and  25  top-ranked  features,  respectively. 

5.3  Recommendations  for  Future  Research 

This  research  provides  a  proof-of-concept  demonstration  that  highlights  the  promise 
for  augmenting  ZigBee  bit-level  security  mechanisms.  This  was  done  using  RL-DNA 
features  with  an  MDA/ML  discrimination  process.  The  work  here  is  by  no  means  complete 
and  there  are  several  potential  directions  that  future  research  could  take: 

1.  Performing  a  detailed  assessment  of  ZigBee  GRLVQI  DRA-Results  here  for 
dimensionally  reduced  feature  sets  were  based  on  two  separate  rank-ordering  and 
selection  methods  (pre-classification  KS-Test  and  post-classification  GRLVQI)  being 
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developed  in  parallel  under  AFIT’s  RF  Intelligence  (RFINT)  program.  GRLVQI 
parameter  settings  and  model  development  were  not  optimized  for  ZigBee  emissions 
as  part  of  this  research.  Further  analysis  and  GRLVQI  optimization  could  be  done  to 
better  exploit  feature  set  dependence,  or  independence,  as  collection  location  varies 
(“CAGE”,  “LOS”,  “WALL”)  and  environmental  conditions  change. 

2.  Increasing  the  number  of  model  training  devices-An  iterative  process  should  be 
considered  for  progressively  expanding  the  pool  of  authorized  devices  being  used 
for  model  development.  The  less  than  perfect  Rogue  Device  Rejection  performance 
here  {RRR±  100%)  was  not  too  surprising  given  that  1)  MDA  model  development 
is  a  classification-based  versus  verification-based  optimization  process  and  similar 
results  have  been  observed  using  other  signals,  and  2)  only  Nd-4  authorized  devices 
were  used  for  hybrid  MDA/ML  model  development;  it  is  highly  unlikely  that  RL- 
DNA  features  from  ND= 4  population  members  of  a  larger  population  (thousands  or 
even  millions)  accurately  capture  population  behavior  and  provide  broad  human-like 
discrimination.  Increasing  the  sample  size  (training  devices)  will  allow  the  developed 
models  to  better  represent  the  larger  device  population. 

3.  Considering  alternate  test  statistics-Results  here  are  based  exclusively  on  inherent 
MATLAB  functionality  for  implementing  MDA  model  development  and  performing 
ML  classification  assessment  ( classify  function),  as  well  as,  ROC  curve  ( roc  function) 
verification  performance  assessment;  the  inherent  normalized  MVG  posterior 
probability  similarity  measure  was  used  exclusively  as  the  test  statistic.  There 
are  a  myriad  of  additional  probability-based,  as  well  as  distance-based,  similarity 
measures  that  could  be  considered  and  which  may  improve  overall  performance. 
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